Run evaluation

POST

/workspaces/{workspace_id}/evaluations/{evaluation_id}/run

Production API
Local development

Start running an evaluation

Authorizations

BearerAuth

Parameters

Path Parameters

workspace_id

required

string format: uuid

Workspace ID

evaluation_id

required

string format: uuid

Evaluation ID

Responses

200

Evaluation started

object

string format: uuid

workspace_id

string format: uuid

name

string

description

string

evaluation_type

string

Allowed values: benchmark task_specific operational safety comparison

targets

Array<object>

object

target_type

string

Allowed values: model stack scenario comparison

target_id

ID of stack or scenario (if applicable)

string format: uuid

model_provider

Model provider for direct model evaluation

string

model_id

Model ID for direct model evaluation

string

label

Display label for this target

string

benchmarks

Array<object>

object

benchmark_type

string

Allowed values: mmlu human_eval gsm8k truthful_qa hellaswag winogrande arc custom

subset

Specific benchmark subset

string

num_samples

Number of samples to evaluate

integer

custom_eval

object

prompt_template

Prompt template for evaluation

string

criteria

Array<object>

object

name

string

weight

number

description

string

judge_config

object

provider

string

model

string

methodology

object

sample_size

integer

default: 100

confidence_level

number

default: 0.95

random_seed

integer

status

string

Allowed values: pending running completed failed cancelled

progress_percent

number

started_at

string format: date-time

completed_at

string format: date-time

error_message

string

results

object

target_results

Array<object>

object

target_label

string

metrics

object

key

additional properties

number

benchmark_scores

object

key

additional properties

number

comparisons

Array<object>

object

target_a

string

target_b

string

metric

string

delta

number

winner

string

overall_summary

string

created_at

string format: date-time

updated_at

string format: date-time