Skip to content

Overview

Stacks define your AI infrastructure configuration including model selection, deployment framework, and hardware requirements.

Stack Components

Model Configuration

  • Provider: anthropic, openai, google, ollama
  • Model ID: Specific model (e.g., claude-3-5-sonnet-20241022)
  • Parameters: Temperature, max tokens, context length

Framework Configuration

  • Orchestration: kubernetes, docker_compose, modal, none
  • Observability: datadog, new_relic, prometheus, none
  • Tracing: Enable/disable distributed tracing
  • Metrics: Enable/disable metrics collection

Hardware Configuration

  • Cloud Provider: aws, gcp, azure, none
  • Instance Family: Compute family (e.g., g4dn, p3)
  • GPU Type: h100, a100, a10g, v100, none
  • Spot Instances: Use spot/preemptible instances

Inference Configuration

  • Model Serving: vllm, tgi, ollama
  • Auto Scaling: Enable/disable auto-scaling
  • Batching: Enable/disable request batching
  • Quantization: int8, int4, none

Default Stack

Set a stack as the workspace default:

POST /api/workspaces/{id}/stacks/{stack_id}/set-default