Introduction
Traditional fraud detection relies on rule-based systems and machine learning models trained on historical data. While effective for known fraud patterns, they struggle with novel attack vectors. LLMs offer a new approach: they can reason about transactions in context, catching sophisticated fraud that rules-based systems miss.
This guide shares our practical experience implementing LLM-powered fraud detection for fintech clients.
The Limitations of Traditional Approaches
### Rule-Based Systems
# Traditional rule-based fraud detection
def check_fraud_rules(transaction):
if transaction.amount > 50000:
return "FLAG: Large transaction"
if transaction.time.hour < 6 or transaction.time.hour > 22:
return "FLAG: Unusual time"
if transaction.country != user.home_country:
return "FLAG: Foreign transaction"
return "PASS"Problems:
### ML-Based Systems
# Traditional ML fraud detection
model = RandomForestClassifier()
model.fit(historical_transactions, fraud_labels)
prediction = model.predict(new_transaction)Problems:
The LLM Advantage
LLMs bring three key capabilities:
- Contextual reasoning: Consider the full transaction history and user behavior
- Natural language explanation: Generate human-readable explanations
- Zero-shot detection: Catch new fraud patterns without retraining
Architecture Overview
[Transaction Stream]
|
v
[Feature Extraction]
|
v
[Quick Rules Filter] --> [PASS 95%]
|
v (5% flagged)
[LLM Analysis]
|
v
[Decision: BLOCK / ALLOW / REVIEW]
|
v
[Human Review Queue]The key insight: LLMs are expensive. We use them surgically on pre-filtered transactions.
Implementation
### Step 1: Feature Extraction
Extract rich context for the LLM:
def extract_transaction_context(transaction, user):
return {
"transaction": {
"amount": transaction.amount,
"merchant": transaction.merchant,
"category": transaction.category,
"location": transaction.location,
"time": transaction.timestamp.isoformat(),
"device": transaction.device_fingerprint,
},
"user_profile": {
"account_age_days": (now - user.created_at).days,
"avg_transaction": user.avg_transaction_amount,
"typical_merchants": user.frequent_merchants,
"typical_locations": user.frequent_locations,
"recent_activity": user.last_10_transactions,
},
"risk_signals": {
"new_device": transaction.device_fingerprint not in user.known_devices,
"new_location": transaction.location not in user.frequent_locations,
"unusual_amount": transaction.amount > user.avg_transaction_amount * 3,
"velocity": user.transactions_last_hour,
}
}### Step 2: Quick Rules Filter
Only send suspicious transactions to the LLM:
def quick_filter(context):
signals = context["risk_signals"] # Calculate risk score (0-100)
score = 0
if signals["new_device"]:
score += 20
if signals["new_location"]:
score += 15
if signals["unusual_amount"]:
score += 25
if signals["velocity"] > 5:
score += 20
# Only send high-risk to LLM
return score >= 30
Result: 95% of transactions pass without LLM analysis, keeping costs manageable.
### Step 3: LLM Analysis
The prompt is critical. We use a structured approach:
```python
FRAUD_ANALYSIS_PROMPT = """
You are a fraud detection analyst reviewing a transaction for potential fraud.
Transaction Details
{transaction_json}
User Profile
{user_profile_json}
Risk Signals Detected
{risk_signals}
Your Task
Analyze this transaction and determine if it's likely fraudulent.
Consider:
Respond with:
| BLOCK |
| MEDIUM |
Format your response as JSON.
"""
async def analyze_with_llm(context):
response = await openai.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a fraud detection analyst."},
{"role": "user", "content": FRAUD_ANALYSIS_PROMPT.format(**context)}
],
response_format={"type": "json_object"},
temperature=0, # Deterministic for consistency
)
return json.loads(response.choices[0].message.content)
### Step 4: Decision and Routingpythonasync def make_decision(transaction, user):
context = extract_transaction_context(transaction, user)
if not quick_filter(context):
return {"decision": "ALLOW", "method": "quick_pass"}
analysis = await analyze_with_llm(context)
if analysis["decision"] == "BLOCK":
# Block and notify user
await block_transaction(transaction)
await notify_user(user, "blocked_transaction", analysis)
return {"decision": "BLOCK", "analysis": analysis}
elif analysis["decision"] == "REVIEW":
# Allow but queue for human review
await queue_for_review(transaction, analysis)
return {"decision": "ALLOW_PENDING_REVIEW", "analysis": analysis}
else:
return {"decision": "ALLOW", "analysis": analysis}
```
Real-World Results
We deployed this system for a fintech client processing 100K transactions/day:
| Metric | Before (Rules) | After (LLM) | Improvement |
| Fraud Detection Rate | 72% | 94% | +22% |
| False Positive Rate | 8% | 2% | -75% |
| Manual Review Queue | 5000/day | 800/day | -84% |
| Avg. Review Time | N/A | 45 sec | New capability |
| Cost per Transaction | $0.0001 | $0.0008 | +7x |
Key insight: The 7x cost increase is easily justified by the 22% improvement in fraud detection. A single prevented fraud incident often exceeds the monthly LLM costs.
Advanced Techniques
### Chain-of-Thought for Complex Cases
For high-value transactions, we use more sophisticated reasoning:
COT_PROMPT = """
Let's analyze this transaction step by step:- BEHAVIORAL ANALYSIS
- Compare this transaction to the user's typical behavior
- Note any significant deviations- TEMPORAL ANALYSIS
- Is the timing suspicious?
- What was the user doing before this transaction?- NETWORK ANALYSIS
- Is the merchant legitimate?
- Is this merchant connected to known fraud rings?- SYNTHESIS
- Weigh all factors together
- Make a final determinationThink through each step before giving your final answer.
"""
### Multi-Model Ensemble
For critical decisions, we use multiple models:
async def ensemble_decision(context):
analyses = await asyncio.gather(
analyze_with_model("gpt-4o", context),
analyze_with_model("claude-3-opus", context),
analyze_with_model("gemini-1.5-pro", context),
) # Unanimous agreement required for ALLOW
if all(a["decision"] == "ALLOW" for a in analyses):
return "ALLOW"
# Any BLOCK triggers block
if any(a["decision"] == "BLOCK" for a in analyses):
return "BLOCK"
return "REVIEW"
### Feedback Loop
Human reviewers improve the system over time:
async def record_review_outcome(transaction_id, reviewer_decision, notes):
# Store for model evaluation
await store_feedback(transaction_id, reviewer_decision, notes) # Update user risk profile
if reviewer_decision == "CONFIRMED_FRAUD":
await update_user_risk(transaction.user_id, increase=True)
elif reviewer_decision == "FALSE_POSITIVE":
await update_user_risk(transaction.user_id, decrease=True)
# Fine-tune prompts based on patterns
await analyze_feedback_patterns()
Security Considerations
### Prompt Injection Prevention
def sanitize_for_prompt(text):
# Remove potential injection attempts
dangerous_patterns = [
"ignore previous instructions",
"system prompt",
"you are now",
]
for pattern in dangerous_patterns:
text = text.replace(pattern, "[FILTERED]")
return text# Always sanitize user-controlled data
context["merchant_name"] = sanitize_for_prompt(transaction.merchant_name)
### Rate Limiting
RATE_LIMITS = {
"per_user_per_minute": 10,
"per_merchant_per_minute": 100,
"global_per_second": 1000,
}async def check_rate_limits(transaction):
for limit_type, max_count in RATE_LIMITS.items():
current = await get_current_count(limit_type, transaction)
if current >= max_count:
raise RateLimitExceeded(limit_type)
Conclusion
LLM-powered fraud detection isn't about replacing rules or ML models—it's about adding a reasoning layer that catches what they miss. The key principles:
- Use LLMs surgically: Pre-filter to only analyze suspicious transactions
- Structured prompts: Guide the model's reasoning with clear frameworks
- Human-in-the-loop: Keep reviewers for high-stakes decisions
- Continuous learning: Use feedback to improve prompts and rules
At Commit Software, we've deployed this approach across multiple fintech clients. If you're dealing with fraud losses or high false positive rates, [contact us](/contact) to discuss how LLM-powered detection could help.