Skip to main content
Common patterns for managing and running evaluations.
For complete command syntax, see Commands reference.

Show models

Shows information about the configured models.
# List all models
flintai eval models list

# List models with a specific tag
flintai eval models list --tag tier=Fast

# Show details for a model (full ID or unique prefix)
flintai eval models show my-chatbot

Show evaluations

Shows information about the configured evaluations (built-in and custom).
# List all evaluations (builtin + user)
flintai eval evaluations list

# Filter by tag
flintai eval evaluations list --tag owasp_code=LLM01

# Show evaluation details and connected models
flintai eval evaluations show eval-llm01-adversarial

Attach evaluations to models

Creates model-evaluation assignments. Accepts models and evaluations by ID (repeatable) or by tag. Creates the cross-product of all matched models and evaluations.
# Single model, single evaluation
flintai eval model-evaluations attach --model my-chatbot --eval eval-llm01-adversarial

# Single model, multiple evaluations
flintai eval model-evaluations attach \
    --model my-chatbot \
    --eval eval-llm01-adversarial \
    --eval eval-llm02-adversarial

# Multiple models by ID
flintai eval model-evaluations attach \
    --model my-chatbot --model my-agent \
    --eval eval-llm01-adversarial

# Select by tags (all models tagged tier=Fast, all OWASP evaluations)
flintai eval model-evaluations attach \
    --model-tag tier=Fast \
    --eval-tag owasp_code=LLM01

# Mix IDs and tags
flintai eval model-evaluations attach \
    --model my-chatbot \
    --eval-tag source="Flint AI"
Duplicate assignments (same model + evaluation pair) are automatically skipped.

Model-evaluation assignments

Shows information about the assignments of evaluations to models.
# List all assignments
flintai eval model-evaluations list

# Filter by tag
flintai eval model-evaluations list --tag category=owasp

Run evaluations

Runs evaluations as configured. Supports a series of parameters to filter which evaluations and models should be run.
# Run a single model-evaluation by ID
flintai eval run me-chatbot-llm01

# Run all evaluations for a model
flintai eval run --model my-chatbot

# Filter which evaluations to run using tags
flintai eval run --model my-chatbot --eval-tag owasp_code=LLM01

# Set concurrency and output file
flintai eval run --model my-chatbot \
    --concurrency 10 \
    --output results.json

Detach evaluations from models

Removes model-evaluation assignments. Same flexible selection as attach. At least one of --model/--model-tag or --eval/--eval-tag is required.
# Remove a specific assignment
flintai eval model-evaluations detach --model my-chatbot --eval eval-llm01-adversarial

# Remove all evaluations from a model
flintai eval model-evaluations detach --model my-chatbot

# Remove an evaluation from all models
flintai eval model-evaluations detach --eval eval-llm01-adversarial

# Remove by tag
flintai eval model-evaluations detach --model-tag tier=Fast --eval-tag method=Garak