Model Config
Provider, model ID, temperature, max tokens, and inference settings
Stacks are complete infrastructure configurations that Lattice recommends based on your scenario requirements. A stack specifies the model, framework, and hardware choices needed for deployment.
A Stack answers the question: “Given my requirements, what should I actually deploy?”
Model Config
Provider, model ID, temperature, max tokens, and inference settings
Framework Config
Orchestration (LangGraph, LangChain), observability, logging, tracing
Hardware Config
Cloud provider, region, GPU type, instance family, scaling settings
model: provider: anthropic # anthropic | openai | google | ollama model_id: claude-sonnet-4-20250514 temperature: 0.7 max_tokens: 4096 top_p: 0.9| Provider | Models | Best For |
|---|---|---|
| Anthropic | Claude Opus, Sonnet, Haiku | Quality, safety, long context |
| OpenAI | GPT-4, GPT-4 Turbo, GPT-4o | General purpose, function calling |
| Gemini Pro, Gemini Flash | Multimodal, cost efficiency | |
| Ollama | Llama, Mistral, etc. | Local deployment, privacy |
framework: orchestration: langgraph # langgraph | langchain | custom observability: langsmith # langsmith | phoenix | custom logging: structured # structured | json | plaintext tracing: enabled # enabled | disabledhardware: cloud_provider: aws # aws | gcp | azure region: us-east-1 gpu_type: null # Required for self-hosted models instance_family: general # general | compute | memory spot_instances: false # Cost savings vs. reliability auto_scaling: trueLink a stack to a scenario for targeted recommendations:
Generate a stack for my "High-Volume Chat" scenariothat optimizes for cost while meeting SLOs.Lattice analyzes your scenario constraints and recommends:
Create stacks manually in the UI or via API:
Optimized for high-volume, low-latency applications:
name: Claude Haiku Speed Stackdescription: Fastest option for high-volume chat
model: provider: anthropic model_id: claude-3-5-haiku-20241022 temperature: 0.3 max_tokens: 1024
framework: orchestration: custom observability: langsmith logging: structured tracing: enabled
hardware: cloud_provider: aws region: us-east-1 instance_family: compute auto_scaling: trueUse case: Customer support chatbots, real-time assistants
Balanced quality and performance:
name: Claude Sonnet Quality Stackdescription: Best quality for complex reasoning
model: provider: anthropic model_id: claude-sonnet-4-20250514 temperature: 0.7 max_tokens: 4096
framework: orchestration: langgraph observability: langsmith logging: structured tracing: enabled
hardware: cloud_provider: aws region: us-east-1 instance_family: general auto_scaling: trueUse case: RAG applications, content generation, analysis
With automatic fallback:
name: Multi-Provider Resilient Stackdescription: High availability with provider fallback
model: provider: anthropic model_id: claude-sonnet-4-20250514 temperature: 0.7 max_tokens: 4096
fallback: provider: openai model_id: gpt-4-turbo auto_retry: true
framework: orchestration: langgraph observability: langsmith logging: structured tracing: enabled
hardware: cloud_provider: aws region: us-east-1 instance_family: general auto_scaling: trueUse case: Mission-critical applications requiring 99.99% uptime
Ask Lattice to compare stack options:
Compare the Claude Haiku Speed Stack vs Claude Sonnet Quality Stackfor my enterprise RAG scenario. Show the cost and latency tradeoffs.GET /api/workspaces/{workspace_id}/stacksPOST /api/workspaces/{workspace_id}/stacksContent-Type: application/json
{ "name": "Claude Haiku Speed Stack", "model": { "provider": "anthropic", "model_id": "claude-3-5-haiku-20241022", "temperature": 0.3, "max_tokens": 1024 }, "framework": { "orchestration": "custom", "observability": "langsmith" }, "hardware": { "cloud_provider": "aws", "region": "us-east-1" }}GET /api/workspaces/{workspace_id}/stacks/{stack_id}PATCH /api/workspaces/{workspace_id}/stacks/{stack_id}DELETE /api/workspaces/{workspace_id}/stacks/{stack_id}