Overview
Stacks
Section titled “Stacks”Stacks define your AI infrastructure configuration including model selection, deployment framework, and hardware requirements.
Stack Components
Model Configuration
- Provider:
anthropic,openai,google,ollama - Model ID: Specific model (e.g.,
claude-3-5-sonnet-20241022) - Parameters: Temperature, max tokens, context length
Framework Configuration
- Orchestration:
kubernetes,docker_compose,modal,none - Observability:
datadog,new_relic,prometheus,none - Tracing: Enable/disable distributed tracing
- Metrics: Enable/disable metrics collection
Hardware Configuration
- Cloud Provider:
aws,gcp,azure,none - Instance Family: Compute family (e.g.,
g4dn,p3) - GPU Type:
h100,a100,a10g,v100,none - Spot Instances: Use spot/preemptible instances
Inference Configuration
- Model Serving:
vllm,tgi,ollama - Auto Scaling: Enable/disable auto-scaling
- Batching: Enable/disable request batching
- Quantization:
int8,int4,none
Default Stack
Set a stack as the workspace default:
POST /api/workspaces/{id}/stacks/{stack_id}/set-default