Skip to content

Overview

Stacks

Stacks define your AI infrastructure configuration including model selection, deployment framework, and hardware requirements.

Stack Components

Model Configuration

Provider: anthropic, openai, google, ollama
Model ID: Specific model (e.g., claude-3-5-sonnet-20241022)
Parameters: Temperature, max tokens, context length

Framework Configuration

Orchestration: kubernetes, docker_compose, modal, none
Observability: datadog, new_relic, prometheus, none
Tracing: Enable/disable distributed tracing
Metrics: Enable/disable metrics collection

Hardware Configuration

Cloud Provider: aws, gcp, azure, none
Instance Family: Compute family (e.g., g4dn, p3)
GPU Type: h100, a100, a10g, v100, none
Spot Instances: Use spot/preemptible instances

Inference Configuration

Model Serving: vllm, tgi, ollama
Auto Scaling: Enable/disable auto-scaling
Batching: Enable/disable request batching
Quantization: int8, int4, none

Default Stack

Set a stack as the workspace default:

POST /api/workspaces/{id}/stacks/{stack_id}/set-default