Technical Guide14 min readDecember 5, 2024

Implementing Fraud Detection with LLMs: A Practical Approach

Learn how to build an LLM-powered fraud detection system that catches sophisticated fraud patterns traditional rules miss. Includes architecture, prompting strategies, and real-world accuracy metrics.

CST

Commit Software Team

AI Engineering

Introduction


Traditional fraud detection relies on rule-based systems and machine learning models trained on historical data. While effective for known fraud patterns, they struggle with novel attack vectors. LLMs offer a new approach: they can reason about transactions in context, catching sophisticated fraud that rules-based systems miss.

This guide shares our practical experience implementing LLM-powered fraud detection for fintech clients.

The Limitations of Traditional Approaches


### Rule-Based Systems

# Traditional rule-based fraud detection
def check_fraud_rules(transaction):
if transaction.amount > 50000:
return "FLAG: Large transaction"
if transaction.time.hour < 6 or transaction.time.hour > 22:
return "FLAG: Unusual time"
if transaction.country != user.home_country:
return "FLAG: Foreign transaction"
return "PASS"

Problems:

  • Fraudsters learn the rules and work around them

  • High false positive rates (legitimate large purchases get flagged)

  • Can't adapt to new fraud patterns without code changes
  • ### ML-Based Systems

    # Traditional ML fraud detection
    model = RandomForestClassifier()
    model.fit(historical_transactions, fraud_labels)
    prediction = model.predict(new_transaction)

    Problems:

  • Requires labeled training data (fraud is rare, labeling is expensive)

  • Can't explain decisions (black box)

  • Struggles with distribution shift (fraud patterns change)

  • The LLM Advantage


    LLMs bring three key capabilities:

    • Contextual reasoning: Consider the full transaction history and user behavior

    • Natural language explanation: Generate human-readable explanations

    • Zero-shot detection: Catch new fraud patterns without retraining

    Architecture Overview


    [Transaction Stream]
    |
    v
    [Feature Extraction]
    |
    v
    [Quick Rules Filter] --> [PASS 95%]
    |
    v (5% flagged)
    [LLM Analysis]
    |
    v
    [Decision: BLOCK / ALLOW / REVIEW]
    |
    v
    [Human Review Queue]

    The key insight: LLMs are expensive. We use them surgically on pre-filtered transactions.

    Implementation


    ### Step 1: Feature Extraction

    Extract rich context for the LLM:

    def extract_transaction_context(transaction, user):
    return {
    "transaction": {
    "amount": transaction.amount,
    "merchant": transaction.merchant,
    "category": transaction.category,
    "location": transaction.location,
    "time": transaction.timestamp.isoformat(),
    "device": transaction.device_fingerprint,
    },
    "user_profile": {
    "account_age_days": (now - user.created_at).days,
    "avg_transaction": user.avg_transaction_amount,
    "typical_merchants": user.frequent_merchants,
    "typical_locations": user.frequent_locations,
    "recent_activity": user.last_10_transactions,
    },
    "risk_signals": {
    "new_device": transaction.device_fingerprint not in user.known_devices,
    "new_location": transaction.location not in user.frequent_locations,
    "unusual_amount": transaction.amount > user.avg_transaction_amount * 3,
    "velocity": user.transactions_last_hour,
    }
    }

    ### Step 2: Quick Rules Filter

    Only send suspicious transactions to the LLM:

    def quick_filter(context):
    signals = context["risk_signals"]

    # Calculate risk score (0-100)
    score = 0
    if signals["new_device"]:
    score += 20
    if signals["new_location"]:
    score += 15
    if signals["unusual_amount"]:
    score += 25
    if signals["velocity"] > 5:
    score += 20

    # Only send high-risk to LLM
    return score >= 30

    Result: 95% of transactions pass without LLM analysis, keeping costs manageable.

    ### Step 3: LLM Analysis

    The prompt is critical. We use a structured approach:

    ```python
    FRAUD_ANALYSIS_PROMPT = """
    You are a fraud detection analyst reviewing a transaction for potential fraud.

    Transaction Details

    {transaction_json}

    User Profile

    {user_profile_json}

    Risk Signals Detected

    {risk_signals}

    Your Task

    Analyze this transaction and determine if it's likely fraudulent.

    Consider:

  • Is the transaction consistent with the user's typical behavior?

  • Are there multiple risk signals that together suggest fraud?

  • Could this be a legitimate change in behavior (travel, gift, etc.)?
  • Respond with:

  • DECISION: ALLOW
    BLOCK
    REVIEW

  • CONFIDENCE: HIGH
    MEDIUM
    LOW

  • EXPLANATION: 2-3 sentences explaining your reasoning

  • RISK_FACTORS: List of specific concerns

  • MITIGATING_FACTORS: List of factors suggesting legitimacy
  • Format your response as JSON.
    """

    async def analyze_with_llm(context):
    response = await openai.chat.completions.create(
    model="gpt-4o",
    messages=[
    {"role": "system", "content": "You are a fraud detection analyst."},
    {"role": "user", "content": FRAUD_ANALYSIS_PROMPT.format(**context)}
    ],
    response_format={"type": "json_object"},
    temperature=0, # Deterministic for consistency
    )
    return json.loads(response.choices[0].message.content)

    ### Step 4: Decision and Routing
    python
    async def make_decision(transaction, user):
    context = extract_transaction_context(transaction, user)

    if not quick_filter(context):
    return {"decision": "ALLOW", "method": "quick_pass"}

    analysis = await analyze_with_llm(context)

    if analysis["decision"] == "BLOCK":
    # Block and notify user
    await block_transaction(transaction)
    await notify_user(user, "blocked_transaction", analysis)
    return {"decision": "BLOCK", "analysis": analysis}

    elif analysis["decision"] == "REVIEW":
    # Allow but queue for human review
    await queue_for_review(transaction, analysis)
    return {"decision": "ALLOW_PENDING_REVIEW", "analysis": analysis}

    else:
    return {"decision": "ALLOW", "analysis": analysis}
    ```

    Real-World Results


    We deployed this system for a fintech client processing 100K transactions/day:

    MetricBefore (Rules)After (LLM)Improvement

    Fraud Detection Rate72%94%+22%

    False Positive Rate8%2%-75%

    Manual Review Queue5000/day800/day-84%

    Avg. Review TimeN/A45 secNew capability

    Cost per Transaction$0.0001$0.0008+7x

    Key insight: The 7x cost increase is easily justified by the 22% improvement in fraud detection. A single prevented fraud incident often exceeds the monthly LLM costs.

    Advanced Techniques


    ### Chain-of-Thought for Complex Cases

    For high-value transactions, we use more sophisticated reasoning:

    COT_PROMPT = """
    Let's analyze this transaction step by step:

    • BEHAVIORAL ANALYSIS

    • - Compare this transaction to the user's typical behavior
      - Note any significant deviations

      • TEMPORAL ANALYSIS

      • - Is the timing suspicious?
        - What was the user doing before this transaction?

        • NETWORK ANALYSIS

        • - Is the merchant legitimate?
          - Is this merchant connected to known fraud rings?

          • SYNTHESIS

          • - Weigh all factors together
            - Make a final determination

            Think through each step before giving your final answer.
            """

    ### Multi-Model Ensemble

    For critical decisions, we use multiple models:

    async def ensemble_decision(context):
    analyses = await asyncio.gather(
    analyze_with_model("gpt-4o", context),
    analyze_with_model("claude-3-opus", context),
    analyze_with_model("gemini-1.5-pro", context),
    )

    # Unanimous agreement required for ALLOW
    if all(a["decision"] == "ALLOW" for a in analyses):
    return "ALLOW"

    # Any BLOCK triggers block
    if any(a["decision"] == "BLOCK" for a in analyses):
    return "BLOCK"

    return "REVIEW"

    ### Feedback Loop

    Human reviewers improve the system over time:

    async def record_review_outcome(transaction_id, reviewer_decision, notes):
    # Store for model evaluation
    await store_feedback(transaction_id, reviewer_decision, notes)

    # Update user risk profile
    if reviewer_decision == "CONFIRMED_FRAUD":
    await update_user_risk(transaction.user_id, increase=True)
    elif reviewer_decision == "FALSE_POSITIVE":
    await update_user_risk(transaction.user_id, decrease=True)

    # Fine-tune prompts based on patterns
    await analyze_feedback_patterns()


    Security Considerations


    ### Prompt Injection Prevention

    def sanitize_for_prompt(text):
    # Remove potential injection attempts
    dangerous_patterns = [
    "ignore previous instructions",
    "system prompt",
    "you are now",
    ]
    for pattern in dangerous_patterns:
    text = text.replace(pattern, "[FILTERED]")
    return text

    # Always sanitize user-controlled data
    context["merchant_name"] = sanitize_for_prompt(transaction.merchant_name)

    ### Rate Limiting

    RATE_LIMITS = {
    "per_user_per_minute": 10,
    "per_merchant_per_minute": 100,
    "global_per_second": 1000,
    }

    async def check_rate_limits(transaction):
    for limit_type, max_count in RATE_LIMITS.items():
    current = await get_current_count(limit_type, transaction)
    if current >= max_count:
    raise RateLimitExceeded(limit_type)


    Conclusion


    LLM-powered fraud detection isn't about replacing rules or ML models—it's about adding a reasoning layer that catches what they miss. The key principles:

    • Use LLMs surgically: Pre-filter to only analyze suspicious transactions

    • Structured prompts: Guide the model's reasoning with clear frameworks

    • Human-in-the-loop: Keep reviewers for high-stakes decisions

    • Continuous learning: Use feedback to improve prompts and rules

    At Commit Software, we've deployed this approach across multiple fintech clients. If you're dealing with fraud losses or high false positive rates, [contact us](/contact) to discuss how LLM-powered detection could help.

    Tags

    Fraud DetectionLLMSecurityAIFintech

    Need Help Implementing This?

    Our team specializes in building production-grade AI systems. Let's discuss how we can help with your project.

    Schedule a Consultation