Infrastructure16 min readNovember 20, 2024

From Prototype to Production: Deploying LangChain on GCP

A step-by-step guide to taking your LangChain application from laptop to production on Google Cloud Platform. Covers Cloud Run, Cloud SQL, Vertex AI, and operational best practices.

CST

Commit Software Team

DevOps Engineering

Introduction


You've built a LangChain application that works great on your laptop. Now what? Moving from prototype to production involves decisions about infrastructure, security, observability, and cost optimization.

This guide walks through deploying a LangChain application on Google Cloud Platform, using the same patterns we use for production deployments at Commit Software.

Architecture Overview


[Cloud Load Balancer]
|
v
[Cloud Run (API)]

v v
[Cloud SQL] [Vertex AI]

v v
[Secrets] [Embeddings & LLMs]
\ /
v v
[Vector Search]

Why GCP?

  • Vertex AI provides managed access to Gemini models

  • Cloud Run offers excellent auto-scaling for bursty AI workloads

  • Native integration between services reduces complexity

  • Step 1: Project Setup


    ### Enable Required APIs

    gcloud services enable \
    run.googleapis.com \
    cloudbuild.googleapis.com \
    secretmanager.googleapis.com \
    sqladmin.googleapis.com \
    aiplatform.googleapis.com \
    compute.googleapis.com

    ### Create Service Account

    # Create service account for Cloud Run
    gcloud iam service-accounts create langchain-app \
    --display-name="LangChain Application"

    # Grant necessary permissions
    gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member="serviceAccount:langchain-app@$PROJECT_ID.iam.gserviceaccount.com" \
    --role="roles/aiplatform.user"

    gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member="serviceAccount:langchain-app@$PROJECT_ID.iam.gserviceaccount.com" \
    --role="roles/cloudsql.client"

    gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member="serviceAccount:langchain-app@$PROJECT_ID.iam.gserviceaccount.com" \
    --role="roles/secretmanager.secretAccessor"


    Step 2: Database Setup (Cloud SQL)


    ### Create PostgreSQL Instance

    gcloud sql instances create langchain-db \
    --database-version=POSTGRES_15 \
    --tier=db-f1-micro \
    --region=asia-south1 \
    --storage-type=SSD \
    --storage-size=10GB \
    --availability-type=zonal

    For production, use db-custom-2-4096 or higher, and --availability-type=regional for HA.

    ### Enable pgvector Extension

    # Connect to the database
    gcloud sql connect langchain-db --user=postgres

    # In psql:
    CREATE DATABASE langchain;
    \c langchain
    CREATE EXTENSION vector;

    ### Create Tables

    -- Documents table for RAG
    CREATE TABLE documents (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    content TEXT NOT NULL,
    metadata JSONB DEFAULT '{}',
    embedding vector(768),
    created_at TIMESTAMPTZ DEFAULT NOW()
    );

    -- Index for vector similarity search
    CREATE INDEX ON documents
    USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100);

    -- Conversation history
    CREATE TABLE conversations (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    session_id VARCHAR(100) NOT NULL,
    role VARCHAR(20) NOT NULL,
    content TEXT NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
    );

    CREATE INDEX idx_conversations_session
    ON conversations(session_id, created_at);


    Step 3: Secrets Management


    Store sensitive values in Secret Manager:

    # Create secrets
    echo -n "your-openai-key" | gcloud secrets create openai-api-key --data-file=-
    echo -n "your-db-password" | gcloud secrets create db-password --data-file=-

    # Grant access to service account
    gcloud secrets add-iam-policy-binding openai-api-key \
    --member="serviceAccount:langchain-app@$PROJECT_ID.iam.gserviceaccount.com" \
    --role="roles/secretmanager.secretAccessor"


    Step 4: Application Code


    ### Project Structure

    langchain-app/
    ├── app/
    │ ├── __init__.py
    │ ├── main.py # FastAPI application
    │ ├── chains.py # LangChain chains
    │ ├── database.py # Database connection
    │ └── config.py # Configuration
    ├── Dockerfile
    ├── requirements.txt
    └── cloudbuild.yaml

    ### Configuration (config.py)

    import os
    from functools import lru_cache
    from pydantic_settings import BaseSettings

    class Settings(BaseSettings):
    # Database
    db_host: str = os.getenv("DB_HOST", "localhost")
    db_name: str = "langchain"
    db_user: str = "postgres"
    db_password: str = ""

    # AI
    openai_api_key: str = ""
    google_project: str = ""

    # App
    environment: str = "development"

    @property
    def database_url(self) -> str:
    return f"postgresql://{self.db_user}:{self.db_password}@{self.db_host}/{self.db_name}"

    @lru_cache()
    def get_settings() -> Settings:
    return Settings()

    ### LangChain Setup (chains.py)

    from langchain_google_vertexai import VertexAI, VertexAIEmbeddings
    from langchain_community.vectorstores import PGVector
    from langchain.chains import ConversationalRetrievalChain
    from langchain.memory import PostgresChatMessageHistory
    from app.config import get_settings

    settings = get_settings()

    # Initialize embeddings
    embeddings = VertexAIEmbeddings(
    model_name="textembedding-gecko@003",
    project=settings.google_project
    )

    # Initialize LLM
    llm = VertexAI(
    model_name="gemini-1.5-flash",
    project=settings.google_project,
    temperature=0,
    max_output_tokens=2048
    )

    # Initialize vector store
    vectorstore = PGVector(
    collection_name="documents",
    connection_string=settings.database_url,
    embedding_function=embeddings
    )

    def create_chain(session_id: str):
    """Create a conversational chain with memory."""
    memory = PostgresChatMessageHistory(
    connection_string=settings.database_url,
    session_id=session_id,
    table_name="conversations"
    )

    return ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=vectorstore.as_retriever(search_kwargs={"k": 5}),
    memory=memory,
    return_source_documents=True
    )

    ### FastAPI Application (main.py)

    from fastapi import FastAPI, HTTPException
    from pydantic import BaseModel
    from app.chains import create_chain
    import logging

    logging.basicConfig(level=logging.INFO)
    logger = logging.getLogger(__name__)

    app = FastAPI(title="LangChain API")

    class QueryRequest(BaseModel):
    session_id: str
    query: str

    class QueryResponse(BaseModel):
    answer: str
    sources: list[dict]

    @app.post("/query", response_model=QueryResponse)
    async def query(request: QueryRequest):
    try:
    chain = create_chain(request.session_id)
    result = chain.invoke({"question": request.query})

    return QueryResponse(
    answer=result["answer"],
    sources=[
    {"content": doc.page_content[:200], "metadata": doc.metadata}
    for doc in result.get("source_documents", [])
    ]
    )
    except Exception as e:
    logger.error(f"Query failed: {e}")
    raise HTTPException(status_code=500, detail=str(e))

    @app.get("/health")
    async def health():
    return {"status": "healthy"}

    ### Dockerfile

    FROM python:3.11-slim

    WORKDIR /app

    # Install system dependencies
    RUN apt-get update && apt-get install -y \
    gcc \
    libpq-dev \
    && rm -rf /var/lib/apt/lists/*

    # Install Python dependencies
    COPY requirements.txt .
    RUN pip install --no-cache-dir -r requirements.txt

    # Copy application
    COPY app/ app/

    # Run with gunicorn for production
    CMD ["gunicorn", "app.main:app", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", "-b", "0.0.0.0:8080"]


    Step 5: CI/CD Pipeline


    ### Cloud Build Configuration (cloudbuild.yaml)

    steps:
    # Run tests
    - name: 'python:3.11'
    entrypoint: 'bash'
    args:
    - '-c'
    - |
    pip install -r requirements.txt
    pip install pytest
    pytest tests/ -v

    # Build container
    - name: 'gcr.io/cloud-builders/docker'
    args:
    - 'build'
    - '-t'
    - 'gcr.io/$PROJECT_ID/langchain-app:$COMMIT_SHA'
    - '.'

    # Push to registry
    - name: 'gcr.io/cloud-builders/docker'
    args:
    - 'push'
    - 'gcr.io/$PROJECT_ID/langchain-app:$COMMIT_SHA'

    # Deploy to Cloud Run
    - name: 'gcr.io/cloud-builders/gcloud'
    args:
    - 'run'
    - 'deploy'
    - 'langchain-api'
    - '--image=gcr.io/$PROJECT_ID/langchain-app:$COMMIT_SHA'
    - '--region=asia-south1'
    - '--platform=managed'
    - '--service-account=langchain-app@$PROJECT_ID.iam.gserviceaccount.com'
    - '--set-env-vars=GOOGLE_PROJECT=$PROJECT_ID'
    - '--set-secrets=OPENAI_API_KEY=openai-api-key:latest,DB_PASSWORD=db-password:latest'
    - '--add-cloudsql-instances=$PROJECT_ID:asia-south1:langchain-db'
    - '--set-env-vars=DB_HOST=/cloudsql/$PROJECT_ID:asia-south1:langchain-db'
    - '--allow-unauthenticated'
    - '--min-instances=1'
    - '--max-instances=100'
    - '--memory=2Gi'
    - '--cpu=2'

    images:
    - 'gcr.io/$PROJECT_ID/langchain-app:$COMMIT_SHA'


    Step 6: Observability


    ### Structured Logging

    import json
    import logging
    from google.cloud import logging as cloud_logging

    # Set up Cloud Logging
    client = cloud_logging.Client()
    client.setup_logging()

    class StructuredLogger:
    def __init__(self, name: str):
    self.logger = logging.getLogger(name)

    def info(self, message: str, kwargs):
    self.logger.info(json.dumps({
    "message": message,
    "severity": "INFO",
    kwargs
    }))

    def error(self, message: str, kwargs):
    self.logger.error(json.dumps({
    "message": message,
    "severity": "ERROR",
    kwargs
    }))

    logger = StructuredLogger("langchain-app")

    # Usage
    logger.info("Query processed",
    session_id="abc123",
    tokens_used=1500,
    latency_ms=2300
    )

    ### LangSmith Integration

    import os

    # Enable LangSmith tracing
    os.environ["LANGCHAIN_TRACING_V2"] = "true"
    os.environ["LANGCHAIN_PROJECT"] = "production-langchain-app"

    # All LangChain calls are now traced

    ### Custom Metrics

    from google.cloud import monitoring_v3
    import time

    def track_query_metrics(session_id: str, query: str):
    client = monitoring_v3.MetricServiceClient()
    project_name = f"projects/{settings.google_project}"

    start_time = time.time()
    yield # Execute the query
    latency = time.time() - start_time

    # Write custom metric
    series = monitoring_v3.TimeSeries()
    series.metric.type = "custom.googleapis.com/langchain/query_latency"
    series.metric.labels["session_id"] = session_id
    series.points = [{
    "interval": {"end_time": {"seconds": int(time.time())}},
    "value": {"double_value": latency}
    }]

    client.create_time_series(name=project_name, time_series=[series])


    Step 7: Security Hardening


    ### Authentication

    from fastapi import Depends, HTTPException, Security
    from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
    from google.auth.transport import requests
    from google.oauth2 import id_token

    security = HTTPBearer()

    async def verify_token(credentials: HTTPAuthorizationCredentials = Security(security)):
    try:
    # Verify Google ID token
    idinfo = id_token.verify_oauth2_token(
    credentials.credentials,
    requests.Request(),
    audience=settings.google_project
    )
    return idinfo
    except ValueError:
    raise HTTPException(status_code=401, detail="Invalid token")

    @app.post("/query")
    async def query(request: QueryRequest, user: dict = Depends(verify_token)):
    # user is now authenticated
    ...

    ### Rate Limiting

    from slowapi import Limiter
    from slowapi.util import get_remote_address

    limiter = Limiter(key_func=get_remote_address)
    app.state.limiter = limiter

    @app.post("/query")
    @limiter.limit("30/minute")
    async def query(request: QueryRequest):
    ...

    ### VPC Service Controls

    # Create VPC connector for Cloud Run
    gcloud compute networks vpc-access connectors create langchain-connector \
    --region=asia-south1 \
    --network=default \
    --range=10.8.0.0/28

    # Update Cloud Run to use VPC
    gcloud run services update langchain-api \
    --vpc-connector=langchain-connector \
    --vpc-egress=all-traffic


    Step 8: Cost Optimization


    ### Cloud Run Configuration

    # Optimize for cost
    gcloud run services update langchain-api \
    --min-instances=0 \ # Scale to zero when idle
    --max-instances=10 \ # Cap maximum
    --cpu-boost \ # Faster cold starts
    --execution-environment=gen2

    ### Committed Use Discounts

    For predictable workloads, commit to usage:

    # 1-year commitment for Vertex AI
    gcloud compute commitments create langchain-commitment \
    --region=asia-south1 \
    --resources=vcpu=4,memory=16GB \
    --plan=12-month

    ### Caching Layer

    Add Redis for response caching:

    # Create Memorystore Redis
    gcloud redis instances create langchain-cache \
    --size=1 \
    --region=asia-south1 \
    --redis-version=redis_7_0

    Deployment Checklist


    Before going live:

    • [ ] All secrets in Secret Manager

    • [ ] Cloud SQL connection via private IP

    • [ ] Authentication enabled

    • [ ] Rate limiting configured

    • [ ] Structured logging enabled

    • [ ] Custom metrics tracking

    • [ ] Health check endpoint verified

    • [ ] Error alerting configured

    • [ ] Backup strategy in place

    • [ ] Cost alerts set up

    Conclusion


    Deploying LangChain to production on GCP requires attention to:

    • Security: Service accounts, secrets, VPC controls

    • Reliability: Health checks, auto-scaling, error handling

    • Observability: Logging, metrics, tracing

    • Cost: Right-sizing, caching, committed use

    The patterns in this guide have been battle-tested across multiple production deployments. Start with the basics, then layer in optimizations as your traffic grows.

    Need help deploying your LangChain application? [Contact us](/contact) for expert guidance.

    Tags

    LangChainGCPDeploymentCloud RunProduction

    Need Help Implementing This?

    Our team specializes in building production-grade AI systems. Let's discuss how we can help with your project.

    Schedule a Consultation