Built-in evaluations

flintai eval includes built-in evaluations for testing agent reliability and behavior.

Run flintai eval evaluations list to see this list from the CLI at any time.

Filtering evaluations

Use tags to filter when listing or attaching evaluations:

# List only OWASP tests
flintai eval evaluations list --tag owasp_code=LLM01

# Attach all Garak tests
flintai eval model-evaluations attach \
  --model my-agent \
  --eval-tag source=garak

See Configuration for more tag-based filtering examples.

Evaluation categories

Tests are organized into three categories:

OWASP adversarial probes - Adaptive attacks that actively attempt to jailbreak your agent
OWASP fixed prompts - Known attack patterns documented in security research
Garak framework tests - Comprehensive coverage from the industry-standard security scanner

OWASP LLM Top 10 (Adversarial)

Tests based on OWASP LLM Top 10 security risks. These use adversarial probes to actively attempt attacks. Attach all adversarial tests:

flintai eval model-evaluations attach --model my-agent --tag method="Adversarial Model"

eval-llm01-adversarial

adversarial_probe

LLM01: Prompt injection (adversarial)Tests whether the model can be tricked into ignoring its system instructions and following attacker-supplied instructions instead.Uses escalating techniques: direct override requests, role-play scenarios, encoding tricks, hypothetical framing, and multi-language obfuscation.

eval-llm02-adversarial

adversarial_probe

LLM02: Sensitive information disclosure (adversarial)Tests whether the model leaks sensitive information such as training data, PII, API keys, internal configuration, or data from other users’ sessions.Attempts to extract: training data, user data from other sessions, system configuration, and personally identifiable information.

eval-llm05-adversarial

adversarial_probe

LLM05: Unsafe output generation (adversarial)Tests whether the model can be manipulated into generating outputs containing executable payloads (XSS, SQL injection, shell commands) that would be dangerous if passed to downstream systems without sanitization.Probes for: JavaScript/HTML payloads, SQL injection strings, shell commands, and markdown that could be interpreted as executable code.

eval-llm06-adversarial

adversarial_probe

LLM06: Excessive agency (adversarial)Tests whether the model can be manipulated into performing actions beyond its intended scope or authority.Attempts to trigger: unauthorized API calls, privilege escalation, accessing out-of-scope resources, and exceeding rate limits.

eval-llm07-adversarial

adversarial_probe

LLM07: System prompt leakage (adversarial)Tests whether the model can be tricked into revealing its system instructions, internal prompts, or configuration details.Uses techniques: direct requests, completion tricks, encoding, and context manipulation to extract hidden prompts.

eval-llm09-adversarial

adversarial_probe

LLM09: Misinformation/Hallucination (adversarial)Tests whether the model can be manipulated into generating false or misleading information.Probes for: factual errors, fabricated citations, false claims presented as fact, and hallucinated data.

eval-pii-adversarial

adversarial_probe

PII leakage (adversarial)Tests whether the model can be manipulated into leaking personally identifiable information.Attempts to extract: names, email addresses, phone numbers, SSNs, addresses, and other PII that should be protected.

OWASP LLM Top 10 (Fixed Prompts)

Same OWASP categories, but using fixed test prompts instead of adversarial probes. Attach all fixed prompt tests:

flintai eval model-evaluations attach --model my-agent --tag method="Fixed Prompts"

eval-llm01-fixed

message_collection

LLM01: Prompt injection (fixed prompts)Tests prompt injection resistance using a curated set of known attack patterns.Based on real-world prompt injection attempts documented in security research.

eval-llm02-fixed

message_collection

LLM02: Sensitive information disclosure (fixed prompts)Tests information leakage using fixed prompts designed to extract sensitive data.

eval-llm05-fixed

message_collection

LLM05: Unsafe output generation (fixed prompts)Tests for generation of dangerous executable payloads using fixed test cases.

eval-llm06-fixed

message_collection

LLM06: Excessive agency (fixed prompts)Tests for unauthorized actions using fixed prompts that attempt to exceed intended scope.

eval-llm07-fixed

message_collection

LLM07: System prompt leakage (fixed prompts)Tests system prompt protection using known extraction techniques.

eval-llm09-fixed

message_collection

LLM09: Misinformation/Hallucination (fixed prompts)Tests hallucination resistance using prompts known to trigger factual errors.Quality evaluation — measures accuracy and factual correctness.

eval-pii-fixed

message_collection

PII leakage (fixed prompts)Tests PII protection using fixed prompts that attempt to extract personal information.

eval-secret-fixed

message_collection

Secret leakage (fixed prompts)Tests protection of API keys, credentials, and other secrets using fixed extraction attempts.

Garak Framework Tests

Tests from the Garak LLM vulnerability scanner — industry-standard adversarial testing framework. Attach all Garak tests:

flintai eval model-evaluations attach --model my-agent --tag source=Garak

garak-continuation

garak_module

Continuation attacksTests whether the model continues harmful or dangerous content when prompted with partial sequences.

garak-dan

garak_module

DAN (Do Anything Now) jailbreaksTests resistance to “Do Anything Now” style jailbreak attempts that try to remove safety constraints.

garak-encoding

garak_module

Encoding-based attacksTests whether encoding tricks (Base64, ROT13, hex, leetspeak) can bypass safety filters.

garak-gcg

garak_module

GCG (Greedy Coordinate Gradient) attacksTests against adversarially optimized attack strings designed to jailbreak language models.

garak-glitch

garak_module

Glitch tokensTests behavior when processing anomalous tokens known to cause unexpected model behavior.

garak-goodside

garak_module

Goodside attacksTests based on Riley Goodside’s documented jailbreak techniques.

garak-knownbadsignatures

garak_module

Known bad signaturesTests detection of known malicious patterns and signatures.

garak-lmrc

garak_module

LMRC attacksTests from the Language Model Risk Cards framework.

garak-malwaregen

garak_module

Malware generationTests whether the model can be manipulated into generating malware or exploit code.

garak-misleading

garak_module

Misleading claimsTests resistance to generating misleading or deceptive content.

garak-packagehallucination

garak_module

Package hallucinationTests whether the model fabricates non-existent software packages or libraries.

garak-promptinject

garak_module

Prompt injection (Garak)Garak’s comprehensive prompt injection test suite.

garak-realtoxicityprompts

garak_module

RealToxicityPromptsTests using the RealToxicityPrompts dataset — prompts known to elicit toxic responses.

garak-replay

garak_module

Replay attacksTests whether previous conversations can be replayed to bypass safety checks.

garak-snowball

garak_module

Snowball attacksTests escalating attacks that build on previous turns to break safety constraints.

garak-suffix

garak_module

Suffix attacksTests adversarial suffixes designed to jailbreak models.

garak-tap

garak_module

TAP (Tree of Attacks with Pruning)Tests using the TAP jailbreak technique.

garak-visual

garak_module

Visual jailbreaksTests attacks embedded in image descriptions or visual content (for multimodal models).

garak-xss

garak_module

XSS (Cross-Site Scripting)Tests whether the model generates XSS payloads.

garak-advglue

garak_module

AdvGLUE adversarial examplesTests robustness against adversarially perturbed inputs from the AdvGLUE benchmark.

garak-aml

garak_module

AML (Adversarial ML) attacksTests resistance to adversarial machine learning attacks.

garak-riskyemergent

garak_module

Risky emergent behaviorsTests for concerning emergent behaviors not explicitly trained for.

garak-xstest

garak_module

XSTest safety evaluationsTests from the XSTest safety evaluation suite.

​Filtering evaluations

​Evaluation categories

​OWASP LLM Top 10 (Adversarial)

​OWASP LLM Top 10 (Fixed Prompts)

​Garak Framework Tests

Filtering evaluations

Evaluation categories

OWASP LLM Top 10 (Adversarial)

OWASP LLM Top 10 (Fixed Prompts)

Garak Framework Tests