Skip to content

Scenarios

Scenarios let you define specific workload requirements, enabling Lattice to recommend optimal infrastructure configurations. Instead of generic advice, you get tailored recommendations based on your actual constraints.

A Scenario captures everything about your workload:

  • Workload Type — Chat, RAG, Agentic, Code, Embedding, Fine-tuning
  • Traffic Profile — Expected request volume and patterns
  • SLO Requirements — Latency, throughput, and availability targets
  • Budget Constraints — Monthly limits and per-request costs
  • Compliance Needs — Regions, certifications, vendor preferences
  1. Navigate to Scenarios in your workspace
  2. Click + New Scenario
  3. Fill in the configuration form
  4. Save to activate the scenario
TypeDescriptionTypical Use Case
chatInteractive conversationsCustomer support, assistants
ragRetrieval-augmented generationKnowledge bases, Q&A
agenticMulti-step autonomous tasksWorkflows, automation
codeCode generation and analysisIDE integrations, reviews
embeddingVector embedding generationSearch, similarity
fine-tuningModel customizationDomain adaptation
ProfileDescriptionRequests/sec
low_volumeInternal tools, prototypes< 10
medium_volumeProduction applications10-100
high_volumeScale deployments100-1000
burstVariable with spikesPeaks 10x baseline
slo_requirements:
p50_latency_ms: 200 # Median response time
p95_latency_ms: 500 # 95th percentile
p99_latency_ms: 1000 # 99th percentile (tail latency)
throughput_rps: 1000 # Requests per second
availability: 99.9 # Uptime percentage
budget:
monthly_limit_usd: 5000 # Hard cap
cost_per_1k_requests_usd: 0.10 # Per-request target
compliance:
regions:
- us-east-1
- us-west-2
- eu-west-1
certifications:
- SOC2
- HIPAA
- GDPR
vendor_lock_in_tolerance: low # none | low | medium | high

Set an active scenario to inform AI responses:

With my "High-Volume Chat" scenario active:
"What model should I use for the best latency/cost tradeoff?"

The AI considers your SLOs, budget, and compliance when recommending.

Scenarios drive stack recommendations:

Generate a stack configuration for my "Enterprise RAG" scenario
that prioritizes accuracy over speed.

Compare scenarios to understand tradeoffs:

How would costs change if I relaxed my P95 latency from
500ms to 1000ms in my current scenario?
name: High-Volume Chat
workload_type: chat
traffic_profile: high_volume
slo_requirements:
p50_latency_ms: 200
p95_latency_ms: 500
throughput_rps: 1000
availability: 99.9
budget:
monthly_limit_usd: 5000
cost_per_1k_requests_usd: 0.10
compliance:
regions: [us-east-1, us-west-2]
certifications: [SOC2]
vendor_lock_in_tolerance: medium
name: Enterprise RAG
workload_type: rag
traffic_profile: medium_volume
slo_requirements:
p50_latency_ms: 1000
p95_latency_ms: 3000
throughput_rps: 100
availability: 99.95
budget:
monthly_limit_usd: 15000
cost_per_1k_requests_usd: 0.50
compliance:
regions: [us-east-1, eu-west-1]
certifications: [SOC2, HIPAA, GDPR]
vendor_lock_in_tolerance: low
GET /api/workspaces/{workspace_id}/scenarios
POST /api/workspaces/{workspace_id}/scenarios
Content-Type: application/json
{
"name": "High-Volume Chat",
"workload_type": "chat",
"traffic_profile": "high_volume",
"slo_requirements": {
"p50_latency_ms": 200,
"p95_latency_ms": 500,
"throughput_rps": 1000,
"availability": 99.9
},
"budget": {
"monthly_limit_usd": 5000,
"cost_per_1k_requests_usd": 0.10
},
"compliance": {
"regions": ["us-east-1", "us-west-2"],
"certifications": ["SOC2"]
}
}
GET /api/workspaces/{workspace_id}/scenarios/{scenario_id}
PATCH /api/workspaces/{workspace_id}/scenarios/{scenario_id}
DELETE /api/workspaces/{workspace_id}/scenarios/{scenario_id}
POST /api/workspaces/{workspace_id}/scenarios/extract
Content-Type: application/json
{
"description": "High-volume chat app with 500ms P95 latency..."
}