Build Stacks
Stacks translate your scenario requirements into concrete infrastructure configurations ready for deployment.
From Requirements to Deployment
Section titled “From Requirements to Deployment”Scenario → Stack → Deployment(What you need) → (What to build) → (How to run it)A stack answers: “Given my requirements, exactly what should I deploy?”
Stack Generation Process
Section titled “Stack Generation Process”-
Start with a Scenario
Ensure you have a well-defined scenario:
name: Production Chatworkload_type: chattraffic_profile: high_volumeslo_requirements:p95_latency_ms: 500throughput_rps: 200budget:monthly_limit_usd: 5000 -
Request Stack Generation
Ask Lattice to recommend a stack:
Generate an optimized stack for my Production Chat scenariothat prioritizes latency while staying within budget. -
Review Recommendations
Lattice will explain its choices:
Recommended: Claude Haiku Speed StackModel: Claude 3.5 Haiku- P95 latency: ~300ms (meets 500ms SLO)- Cost: ~$2,100/month at 200 RPS (within $5K budget)Alternative: Claude Sonnet Quality Stack- P95 latency: ~600ms (exceeds SLO)- Cost: ~$8,400/month (exceeds budget) -
Refine if Needed
Adjust based on priorities:
I'm willing to go to $6000/month if it improves quality.What stack would you recommend then? -
Save the Stack
Save your chosen configuration for deployment reference.
Stack Components
Section titled “Stack Components”Model Configuration
Section titled “Model Configuration”Choose your inference settings:
model: provider: anthropic model_id: claude-3-5-haiku-20241022
# Generation parameters temperature: 0.3 # Lower for consistency max_tokens: 1024 # Limit output length top_p: 0.9 # Nucleus sampling
# Advanced options stream: true # Enable streaming stop_sequences: [] # Custom stop tokensFramework Configuration
Section titled “Framework Configuration”Select your orchestration layer:
framework: # Orchestration choice orchestration: langgraph # langgraph | langchain | custom
# Observability observability: langsmith # langsmith | phoenix | custom
# Logging configuration logging: structured # structured | json | plaintext log_level: info # debug | info | warn | error
# Tracing tracing: enabled trace_sampling_rate: 0.1 # Sample 10% of requestsFramework Options
Section titled “Framework Options”| Framework | Best For | Complexity |
|---|---|---|
| LangGraph | Agentic workflows, complex state | Higher |
| LangChain | Standard chains, quick prototypes | Medium |
| Custom | Maximum control, minimal overhead | Variable |
Hardware Configuration
Section titled “Hardware Configuration”Define your infrastructure:
hardware: # Cloud provider cloud_provider: aws # aws | gcp | azure region: us-east-1
# Instance configuration instance_family: compute # general | compute | memory
# Scaling auto_scaling: true min_instances: 2 max_instances: 10
# Cost optimization spot_instances: false # Risky for productionFallback Configuration
Section titled “Fallback Configuration”Add resilience with provider fallback:
fallback: enabled: true provider: openai model_id: gpt-4-turbo
# Trigger conditions triggers: - error_rate > 5% - latency_p99 > 2000ms - provider_unavailable
# Automatic retry auto_retry: true max_retries: 3Example Stacks
Section titled “Example Stacks”Speed-Optimized Stack
Section titled “Speed-Optimized Stack”For latency-critical applications:
name: Speed Stackdescription: Optimized for minimum latency
model: provider: anthropic model_id: claude-3-5-haiku-20241022 temperature: 0.3 max_tokens: 512 # Shorter outputs stream: true
framework: orchestration: custom # Minimal overhead observability: langsmith logging: structured
hardware: cloud_provider: aws region: us-east-1 # Closest to users instance_family: compute auto_scaling: true
# Expected metricsestimated_latency_p95: 250msestimated_monthly_cost: $2500Quality-Optimized Stack
Section titled “Quality-Optimized Stack”For accuracy-critical applications:
name: Quality Stackdescription: Optimized for response quality
model: provider: anthropic model_id: claude-sonnet-4-20250514 temperature: 0.7 max_tokens: 4096 stream: true
framework: orchestration: langgraph observability: langsmith logging: structured tracing: enabled
hardware: cloud_provider: aws region: us-east-1 instance_family: general auto_scaling: true
# Expected metricsestimated_latency_p95: 800msestimated_monthly_cost: $8000Cost-Optimized Stack
Section titled “Cost-Optimized Stack”For budget-constrained applications:
name: Cost Stackdescription: Optimized for minimal spend
model: provider: anthropic model_id: claude-3-5-haiku-20241022 temperature: 0.5 max_tokens: 1024 stream: true
framework: orchestration: custom observability: langsmith logging: json trace_sampling_rate: 0.01 # Minimal tracing
hardware: cloud_provider: aws region: us-west-2 # Sometimes cheaper instance_family: general spot_instances: true # 60-90% savings auto_scaling: true
# Cost optimizationsprompt_caching: enabled # Up to 90% input savingsbatch_processing: enabled # For non-urgent requests
# Expected metricsestimated_latency_p95: 400msestimated_monthly_cost: $1200High-Availability Stack
Section titled “High-Availability Stack”For mission-critical applications:
name: HA Stackdescription: 99.99% availability target
model: provider: anthropic model_id: claude-sonnet-4-20250514 temperature: 0.5 max_tokens: 2048 stream: true
fallback: enabled: true provider: openai model_id: gpt-4-turbo auto_retry: true
framework: orchestration: langgraph observability: langsmith logging: structured tracing: enabled
hardware: cloud_provider: aws region: us-east-1 multi_az: true # Cross-AZ redundancy instance_family: general auto_scaling: true min_instances: 3 # Always available
# Expected metricsestimated_availability: 99.99%estimated_latency_p95: 700msestimated_monthly_cost: $12000Stack Comparison
Section titled “Stack Comparison”Ask Lattice to compare your options:
Compare Speed Stack vs Quality Stack for my Production Chatscenario. Show:- Latency tradeoffs- Cost implications- Quality differences- Risk assessmentDeploying Your Stack
Section titled “Deploying Your Stack”Once you have a stack configuration:
- Export the configuration as YAML or JSON
- Use it to configure your deployment (Docker, Kubernetes, etc.)
- Monitor against SLOs using your observability stack
- Iterate based on production data
Next Steps
Section titled “Next Steps”- API Reference — Implement stack configurations programmatically
- GitHub Access — Follow stack-related feature development