Skip to main content
Flint AI Eval tests agent behavior and reliability at runtime. Configuration lives in ~/.flintai/config.json and defines:
  • What to test - Your running agent’s HTTP endpoint
  • How to test it - Which evaluations to run
  • When to test - Model-evaluation assignments
Configuration is only needed for flintai eval commands. flintai scan uses environment variables instead.

Quick start

Create ~/.flintai/config.json with this minimal configuration:
{
  "models": [
    {
      "id": "my-agent",
      "type": "openai_compatible",
      "name": "My Agent",
      "model_name": "my-agent-v1",
      "host": "http://localhost:8000"
    }
  ],
  "model_evaluations": [
    {
      "id": "me-agent-prompt-injection",
      "model_id": "my-agent",
      "evaluation_id": "eval-llm01-adversarial",
      "name": "My Agent / Prompt injection"
    }
  ]
}
Then run:
flintai eval run --model my-agent
Your agent must be running and accessible at the host URL before testing.

Configuration file format

The config file is a JSON file with five optional top-level sections. Only include sections you need.
Most users only need to define models and attach built-in evaluations via CLI:
{
  "models": [
    {
      "id": "my-agent",
      "type": "openai_compatible",
      "name": "My Agent",
      "model_name": "my-agent-v1",
      "host": "http://localhost:8000"
    }
  ]
}
Then attach evaluations:
flintai eval model-evaluations attach \
  --model my-agent \
  --eval eval-llm01-adversarial

Using environment variables in config

Reference environment variables in config.json using ${VAR_NAME} syntax instead of hardcoding sensitive values:
{
  "models": [
    {
      "id": "my-chatbot",
      "type": "anthropic",
      "name": "Claude Haiku 4.5",
      "model_name": "claude-haiku-4-5",
      "key": "${ANTHROPIC_API_KEY}"
    }
  ]
}
Security: Never hardcode API keys in config files. Use ${...} references to keep credentials in environment variables or .env files instead.
See Environment variables for the complete list and additional examples.

Models section

The models array defines agents or LLMs you want to test. Each model requires these fields:
{
  "id": "my-agent",
  "type": "openai_compatible",
  "name": "My Agent",
  "model_name": "my-agent-v1",
  "host": "http://localhost:8000"
}

Required fields

FieldDescriptionExample
idUnique identifier for CLI commands"my-agent"
typeAgent framework or API type"openai_compatible"
nameHuman-readable display name"My Agent"
model_nameAgent or model name passed to API"gpt-4", "my-agent-v1"

Optional fields

FieldDescriptionExampleApplies To
hostHTTP endpoint where agent runs"http://localhost:8000"Hosted agents
keyAPI key (or use environment variables)"sk-..."All types
endpointCustom API path"/api/chat"HTTP-based types
headersCustom HTTP headers{"X-Custom": "value"}HTTP-based types
temperatureModel temperature (0.0-1.0)0.7All types
tagsKey-value pairs for filtering{"env": "staging"}All types
descriptionHuman-readable description"Production chatbot"All types
input_pathJSONPath for input"$.messages"generic_http, openai_compatible
output_pathJSONPath for output"$.response"generic_http, openai_compatible
immediate_resultReturn immediately vs streamingtrueadk

Supported agent types

TypeUse CaseRequired Fields (beyond id/type/name/model_name)Optional Fields
openai_compatibleOpenAI-compatible APIshostendpoint, headers, input_path, output_path
generic_httpGeneric HTTP APIshostendpoint, headers, input_path, output_path
langserveLangServe endpointshostendpoint, headers
openai_agentOpenAI Agents SDKhostendpoint
anthropic_agentAnthropic agentshostendpoint
adkGoogle ADK agentshostendpoint, immediate_result
anthropicClaude models (direct)Nonekey
openaiOpenAI models (direct)Nonekey
geminiGoogle Gemini (direct)Nonekey
litellmLiteLLM proxyNonekey
huggingfaceHuggingFace modelsNonekey
ollamaOllama local modelshostendpoint
All types support temperature, tags, and description as optional fields.
{
  "id": "production-agent",
  "type": "openai_compatible",
  "name": "Production Agent",
  "model_name": "my-agent-v2",
  "host": "https://api.example.com",
  "endpoint": "/v1/agents/chat",
  "headers": {
    "X-API-Version": "2024-01"
  },
  "temperature": 0.3,
  "tags": {
    "env": "production",
    "team": "platform"
  },
  "description": "Production chatbot serving customer support"
}

Verify your models

# List all configured models
flintai eval models list

# Show details for a specific model
flintai eval models show my-agent

# Filter by tag
flintai eval models list --tag env=staging

Model evaluations section

The model_evaluations array assigns tests to models. Each assignment links one model to one evaluation.
{
  "id": "me-agent-prompt-injection",
  "model_id": "my-agent",
  "evaluation_id": "eval-llm01-adversarial",
  "name": "My Agent / Prompt injection"
}

Required fields

FieldDescriptionExample
idUnique identifier for this assignment"me-agent-llm01"
model_idModel id from your models array"my-agent"
evaluation_idEvaluation ID (built-in or custom)"eval-llm01-adversarial"
nameHuman-readable name for this assignment"My Agent / Prompt injection"

Optional fields

FieldDescriptionExample
weightScoring weight (default: 0.5)0.75
tagsKey-value pairs for filtering{"priority": "high"}
descriptionNotes about this assignment"Critical security test"
{
  "models": [
    {
      "id": "staging-agent",
      "type": "openai_compatible",
      "name": "Staging Agent",
      "model_name": "agent-v1",
      "host": "http://localhost:8000",
      "tags": {"env": "staging"}
    },
    {
      "id": "production-agent",
      "type": "openai_compatible",
      "name": "Production Agent",
      "model_name": "agent-v2",
      "host": "https://api.example.com",
      "tags": {"env": "production"}
    }
  ],
  "model_evaluations": [
    {
      "id": "me-staging-llm01",
      "model_id": "staging-agent",
      "evaluation_id": "eval-llm01-adversarial",
      "name": "Staging / Prompt injection"
    },
    {
      "id": "me-staging-llm02",
      "model_id": "staging-agent",
      "evaluation_id": "eval-llm02-adversarial",
      "name": "Staging / Info disclosure"
    },
    {
      "id": "me-prod-llm01",
      "model_id": "production-agent",
      "evaluation_id": "eval-llm01-adversarial",
      "name": "Production / Prompt injection",
      "weight": 1.0,
      "tags": {"suite": "security"}
    }
  ]
}

To manage model-evaluation assignments via CLI, see the Commands reference or Examples for practical workflows.

Built-in config and overrides

Flint AI loads two config layers:
  1. Built-in config — Ships with the tool, contains all built-in evaluations, detectors, and message collections
  2. User config — Your ~/.flintai/config.json (or path via --config)
The two are merged, with user entries taking precedence on ID conflicts. You can override any built-in evaluation by defining one with the same ID in your config. At startup, Flint AI shows a breakdown:
Models:       1 (0 builtin, 1 user)
Evaluations:  39 (38 builtin, 1 user)
Detectors:    9 (8 builtin, 1 user)

Configuration file location

Default location: ~/.flintai/config.json Override with --config:
flintai eval run --model my-agent --config ./custom-config.json

Browse available evaluations

# List all built-in evaluations
flintai eval evaluations list

# Filter by tag
flintai eval evaluations list --tag owasp_code=LLM01

# Show details for specific evaluation
flintai eval evaluations show eval-llm01-adversarial
See Built-in evaluations for the full catalog.

Next steps

Run Evaluations

Execute tests against your configured models

View Results

Analyze evaluation outputs

Environment Variables

Manage API keys and settings