Introduction
You've built a LangChain application that works great on your laptop. Now what? Moving from prototype to production involves decisions about infrastructure, security, observability, and cost optimization.
This guide walks through deploying a LangChain application on Google Cloud Platform, using the same patterns we use for production deployments at Commit Software.
Architecture Overview
[Cloud Load Balancer]
|
v
[Cloud Run (API)]
v v
[Cloud SQL] [Vertex AI]
v v
[Secrets] [Embeddings & LLMs]
\ /
v v
[Vector Search]Why GCP?
Step 1: Project Setup
### Enable Required APIs
gcloud services enable \
run.googleapis.com \
cloudbuild.googleapis.com \
secretmanager.googleapis.com \
sqladmin.googleapis.com \
aiplatform.googleapis.com \
compute.googleapis.com### Create Service Account
# Create service account for Cloud Run
gcloud iam service-accounts create langchain-app \
--display-name="LangChain Application"# Grant necessary permissions
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:langchain-app@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/aiplatform.user"
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:langchain-app@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/cloudsql.client"
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:langchain-app@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/secretmanager.secretAccessor"
Step 2: Database Setup (Cloud SQL)
### Create PostgreSQL Instance
gcloud sql instances create langchain-db \
--database-version=POSTGRES_15 \
--tier=db-f1-micro \
--region=asia-south1 \
--storage-type=SSD \
--storage-size=10GB \
--availability-type=zonalFor production, use db-custom-2-4096 or higher, and --availability-type=regional for HA.
### Enable pgvector Extension
# Connect to the database
gcloud sql connect langchain-db --user=postgres# In psql:
CREATE DATABASE langchain;
\c langchain
CREATE EXTENSION vector;
### Create Tables
-- Documents table for RAG
CREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
content TEXT NOT NULL,
metadata JSONB DEFAULT '{}',
embedding vector(768),
created_at TIMESTAMPTZ DEFAULT NOW()
);-- Index for vector similarity search
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Conversation history
CREATE TABLE conversations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
session_id VARCHAR(100) NOT NULL,
role VARCHAR(20) NOT NULL,
content TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_conversations_session
ON conversations(session_id, created_at);
Step 3: Secrets Management
Store sensitive values in Secret Manager:
# Create secrets
echo -n "your-openai-key" | gcloud secrets create openai-api-key --data-file=-
echo -n "your-db-password" | gcloud secrets create db-password --data-file=-# Grant access to service account
gcloud secrets add-iam-policy-binding openai-api-key \
--member="serviceAccount:langchain-app@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/secretmanager.secretAccessor"
Step 4: Application Code
### Project Structure
langchain-app/
├── app/
│ ├── __init__.py
│ ├── main.py # FastAPI application
│ ├── chains.py # LangChain chains
│ ├── database.py # Database connection
│ └── config.py # Configuration
├── Dockerfile
├── requirements.txt
└── cloudbuild.yaml### Configuration (config.py)
import os
from functools import lru_cache
from pydantic_settings import BaseSettingsclass Settings(BaseSettings):
# Database
db_host: str = os.getenv("DB_HOST", "localhost")
db_name: str = "langchain"
db_user: str = "postgres"
db_password: str = ""
# AI
openai_api_key: str = ""
google_project: str = ""
# App
environment: str = "development"
@property
def database_url(self) -> str:
return f"postgresql://{self.db_user}:{self.db_password}@{self.db_host}/{self.db_name}"
@lru_cache()
def get_settings() -> Settings:
return Settings()
### LangChain Setup (chains.py)
from langchain_google_vertexai import VertexAI, VertexAIEmbeddings
from langchain_community.vectorstores import PGVector
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import PostgresChatMessageHistory
from app.config import get_settingssettings = get_settings()
# Initialize embeddings
embeddings = VertexAIEmbeddings(
model_name="textembedding-gecko@003",
project=settings.google_project
)
# Initialize LLM
llm = VertexAI(
model_name="gemini-1.5-flash",
project=settings.google_project,
temperature=0,
max_output_tokens=2048
)
# Initialize vector store
vectorstore = PGVector(
collection_name="documents",
connection_string=settings.database_url,
embedding_function=embeddings
)
def create_chain(session_id: str):
"""Create a conversational chain with memory."""
memory = PostgresChatMessageHistory(
connection_string=settings.database_url,
session_id=session_id,
table_name="conversations"
)
return ConversationalRetrievalChain.from_llm(
llm=llm,
retriever=vectorstore.as_retriever(search_kwargs={"k": 5}),
memory=memory,
return_source_documents=True
)
### FastAPI Application (main.py)
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from app.chains import create_chain
import logginglogging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
app = FastAPI(title="LangChain API")
class QueryRequest(BaseModel):
session_id: str
query: str
class QueryResponse(BaseModel):
answer: str
sources: list[dict]
@app.post("/query", response_model=QueryResponse)
async def query(request: QueryRequest):
try:
chain = create_chain(request.session_id)
result = chain.invoke({"question": request.query})
return QueryResponse(
answer=result["answer"],
sources=[
{"content": doc.page_content[:200], "metadata": doc.metadata}
for doc in result.get("source_documents", [])
]
)
except Exception as e:
logger.error(f"Query failed: {e}")
raise HTTPException(status_code=500, detail=str(e))
@app.get("/health")
async def health():
return {"status": "healthy"}
### Dockerfile
FROM python:3.11-slimWORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
gcc \
libpq-dev \
&& rm -rf /var/lib/apt/lists/*
# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY app/ app/
# Run with gunicorn for production
CMD ["gunicorn", "app.main:app", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", "-b", "0.0.0.0:8080"]
Step 5: CI/CD Pipeline
### Cloud Build Configuration (cloudbuild.yaml)
steps:
# Run tests
- name: 'python:3.11'
entrypoint: 'bash'
args:
- '-c'
- |
pip install -r requirements.txt
pip install pytest
pytest tests/ -v # Build container
- name: 'gcr.io/cloud-builders/docker'
args:
- 'build'
- '-t'
- 'gcr.io/$PROJECT_ID/langchain-app:$COMMIT_SHA'
- '.'
# Push to registry
- name: 'gcr.io/cloud-builders/docker'
args:
- 'push'
- 'gcr.io/$PROJECT_ID/langchain-app:$COMMIT_SHA'
# Deploy to Cloud Run
- name: 'gcr.io/cloud-builders/gcloud'
args:
- 'run'
- 'deploy'
- 'langchain-api'
- '--image=gcr.io/$PROJECT_ID/langchain-app:$COMMIT_SHA'
- '--region=asia-south1'
- '--platform=managed'
- '--service-account=langchain-app@$PROJECT_ID.iam.gserviceaccount.com'
- '--set-env-vars=GOOGLE_PROJECT=$PROJECT_ID'
- '--set-secrets=OPENAI_API_KEY=openai-api-key:latest,DB_PASSWORD=db-password:latest'
- '--add-cloudsql-instances=$PROJECT_ID:asia-south1:langchain-db'
- '--set-env-vars=DB_HOST=/cloudsql/$PROJECT_ID:asia-south1:langchain-db'
- '--allow-unauthenticated'
- '--min-instances=1'
- '--max-instances=100'
- '--memory=2Gi'
- '--cpu=2'
images:
- 'gcr.io/$PROJECT_ID/langchain-app:$COMMIT_SHA'
Step 6: Observability
### Structured Logging
import json
import logging
from google.cloud import logging as cloud_logging# Set up Cloud Logging
client = cloud_logging.Client()
client.setup_logging()
class StructuredLogger:
def __init__(self, name: str):
self.logger = logging.getLogger(name)
def info(self, message: str, kwargs):
self.logger.info(json.dumps({
"message": message,
"severity": "INFO",
kwargs
}))
def error(self, message: str, kwargs):
self.logger.error(json.dumps({
"message": message,
"severity": "ERROR",
kwargs
}))
logger = StructuredLogger("langchain-app")
# Usage
logger.info("Query processed",
session_id="abc123",
tokens_used=1500,
latency_ms=2300
)
### LangSmith Integration
import os# Enable LangSmith tracing
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "production-langchain-app"
# All LangChain calls are now traced
### Custom Metrics
from google.cloud import monitoring_v3
import timedef track_query_metrics(session_id: str, query: str):
client = monitoring_v3.MetricServiceClient()
project_name = f"projects/{settings.google_project}"
start_time = time.time()
yield # Execute the query
latency = time.time() - start_time
# Write custom metric
series = monitoring_v3.TimeSeries()
series.metric.type = "custom.googleapis.com/langchain/query_latency"
series.metric.labels["session_id"] = session_id
series.points = [{
"interval": {"end_time": {"seconds": int(time.time())}},
"value": {"double_value": latency}
}]
client.create_time_series(name=project_name, time_series=[series])
Step 7: Security Hardening
### Authentication
from fastapi import Depends, HTTPException, Security
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
from google.auth.transport import requests
from google.oauth2 import id_tokensecurity = HTTPBearer()
async def verify_token(credentials: HTTPAuthorizationCredentials = Security(security)):
try:
# Verify Google ID token
idinfo = id_token.verify_oauth2_token(
credentials.credentials,
requests.Request(),
audience=settings.google_project
)
return idinfo
except ValueError:
raise HTTPException(status_code=401, detail="Invalid token")
@app.post("/query")
async def query(request: QueryRequest, user: dict = Depends(verify_token)):
# user is now authenticated
...
### Rate Limiting
from slowapi import Limiter
from slowapi.util import get_remote_addresslimiter = Limiter(key_func=get_remote_address)
app.state.limiter = limiter
@app.post("/query")
@limiter.limit("30/minute")
async def query(request: QueryRequest):
...
### VPC Service Controls
# Create VPC connector for Cloud Run
gcloud compute networks vpc-access connectors create langchain-connector \
--region=asia-south1 \
--network=default \
--range=10.8.0.0/28# Update Cloud Run to use VPC
gcloud run services update langchain-api \
--vpc-connector=langchain-connector \
--vpc-egress=all-traffic
Step 8: Cost Optimization
### Cloud Run Configuration
# Optimize for cost
gcloud run services update langchain-api \
--min-instances=0 \ # Scale to zero when idle
--max-instances=10 \ # Cap maximum
--cpu-boost \ # Faster cold starts
--execution-environment=gen2### Committed Use Discounts
For predictable workloads, commit to usage:
# 1-year commitment for Vertex AI
gcloud compute commitments create langchain-commitment \
--region=asia-south1 \
--resources=vcpu=4,memory=16GB \
--plan=12-month### Caching Layer
Add Redis for response caching:
# Create Memorystore Redis
gcloud redis instances create langchain-cache \
--size=1 \
--region=asia-south1 \
--redis-version=redis_7_0Deployment Checklist
Before going live:
- [ ] All secrets in Secret Manager
- [ ] Cloud SQL connection via private IP
- [ ] Authentication enabled
- [ ] Rate limiting configured
- [ ] Structured logging enabled
- [ ] Custom metrics tracking
- [ ] Health check endpoint verified
- [ ] Error alerting configured
- [ ] Backup strategy in place
- [ ] Cost alerts set up
Conclusion
Deploying LangChain to production on GCP requires attention to:
- Security: Service accounts, secrets, VPC controls
- Reliability: Health checks, auto-scaling, error handling
- Observability: Logging, metrics, tracing
- Cost: Right-sizing, caching, committed use
The patterns in this guide have been battle-tested across multiple production deployments. Start with the basics, then layer in optimizations as your traffic grows.
Need help deploying your LangChain application? [Contact us](/contact) for expert guidance.