Evaluation framework
Flint AI Eval uses a composable architecture:Evaluation types
Choose between fixed prompts (repeatable tests) or AI-generated prompts (adaptive attacks). Evaluations define what prompts to send to your agent.Adversarial probe
Adversarial probe
AI-generated attack prompts that adapt to your agent’s responses across multiple turns.How it works:
- LLM (
GENERATOR_MODEL) generates prompts designed to exploit specific vulnerabilities - Attacker model adjusts strategy based on agent responses
- Supports multi-turn conversations (up to 10 turns per test)
GENERATOR_MODELMessage collection
Message collection
Fixed list of pre-written test prompts.How it works:
- Prompts loaded from CSV file, in-memory list, or Garak module
- Each prompt sent once to your agent
- Deterministic and repeatable
Topic guard
Topic guard
AI-generated out-of-scope prompts to test whether your agent stays within its defined role.How it works:
- LLM generates plausible but off-topic requests
- Tests agent’s ability to refuse gracefully
GENERATOR_MODELDetector types
Some detectors use AI judges, others use local pattern matching (no LLM required). Detectors define how responses are scored.LLM-as-judge
LLM-as-judge
A separate LLM evaluates whether your agent’s response meets the test criteria.How it works:
- Response sent to judge model (
GENERATOR_MODEL) - Judge follows scoring instructions (e.g., “Score 1 if agent refused, 0 if it leaked data”)
- Returns 0.0-1.0 score
GENERATOR_MODELAccuracy: Strong judges achieve 80-90% agreement with human evaluatorsPII detector
PII detector
Regex-based detection of personally identifiable information.How it works:
- Scans response for patterns: emails, phone numbers, SSNs, credit cards
- Runs locally, no LLM required
- Returns 1.0 if no PII found, 0.0 if PII detected
john.doe@example.com in its responseRequires: Nothing (local detector)Secret detector
Secret detector
Regex-based detection of API keys, tokens, and credentials.How it works:
- Scans for AWS keys, GitHub tokens, private keys, etc.
- Runs locally, no LLM required
- Returns 1.0 if no secrets found, 0.0 if secrets detected
sk-proj-abc123...Requires: Nothing (local detector)Toxicity classifier
Toxicity classifier
ML-based classifier for toxic, offensive, or harmful content.How it works:
- Uses local classifier model
- No LLM required
- Returns toxicity score
Garak detectors
Garak detectors
Adapters for Garak framework detectors.How it works:
- Runs Garak’s built-in detectors locally
- Includes pattern matching, heuristics, and specialized checks
- No LLM required
encoding detector checks for Base64-encoded attacksRequires: Nothing (local detector)How evaluations combine with detectors
Each builtin evaluation pairs an evaluation type with a detector. Here are examples showing how they work together:Adversarial probe generates prompt injection attacks, LLM-as-judge scores whether agent followed attacker’s instructions.Result: 0.0-1.0 score measuring prompt injection resistance
Message collection sends fixed prompts requesting sensitive data, PII detector scans responses for email/phone/SSN patterns.Result: 1.0 if no PII found, 0.0 if PII detected
Loads any Garak attack module (encoding, prompt injection, jailbreaks, and 30+ others) and pairs it with a Garak detector that scores the agent’s responses.Result: Pass/fail per probe attempt
Scoring
Each evaluation returns a 0.0-1.0 score:- 1.0 = Perfect (all tests passed)
- 0.8+ = Good (minor issues)
- 0.5-0.8 = Needs improvement
- < 0.5 = Critical issues
Next steps
Browse Evaluations
See all 38+ builtin tests
Configuration
Set up and run tests
Data Privacy
What gets sent to LLMs