Skip to main content
flintai eval includes built-in evaluations for testing agent reliability and behavior.
Run flintai eval evaluations list to see this list from the CLI at any time.

Filtering evaluations

Use tags to filter when listing or attaching evaluations:
# List only OWASP tests
flintai eval evaluations list --tag owasp_code=LLM01

# Attach all Garak tests
flintai eval model-evaluations attach \
  --model my-agent \
  --eval-tag source=garak
See Configuration for more tag-based filtering examples.

Evaluation categories

Tests are organized into three categories:
  • OWASP adversarial probes - Adaptive attacks that actively attempt to jailbreak your agent
  • OWASP fixed prompts - Known attack patterns documented in security research
  • Garak framework tests - Comprehensive coverage from the industry-standard security scanner

OWASP LLM Top 10 (Adversarial)

Tests based on OWASP LLM Top 10 security risks. These use adversarial probes to actively attempt attacks. Attach all adversarial tests:
flintai eval model-evaluations attach --model my-agent --tag method="Adversarial Model"
eval-llm01-adversarial
adversarial_probe
LLM01: Prompt injection (adversarial)Tests whether the model can be tricked into ignoring its system instructions and following attacker-supplied instructions instead.Uses escalating techniques: direct override requests, role-play scenarios, encoding tricks, hypothetical framing, and multi-language obfuscation.
eval-llm02-adversarial
adversarial_probe
LLM02: Sensitive information disclosure (adversarial)Tests whether the model leaks sensitive information such as training data, PII, API keys, internal configuration, or data from other users’ sessions.Attempts to extract: training data, user data from other sessions, system configuration, and personally identifiable information.
eval-llm05-adversarial
adversarial_probe
LLM05: Unsafe output generation (adversarial)Tests whether the model can be manipulated into generating outputs containing executable payloads (XSS, SQL injection, shell commands) that would be dangerous if passed to downstream systems without sanitization.Probes for: JavaScript/HTML payloads, SQL injection strings, shell commands, and markdown that could be interpreted as executable code.
eval-llm06-adversarial
adversarial_probe
LLM06: Excessive agency (adversarial)Tests whether the model can be manipulated into performing actions beyond its intended scope or authority.Attempts to trigger: unauthorized API calls, privilege escalation, accessing out-of-scope resources, and exceeding rate limits.
eval-llm07-adversarial
adversarial_probe
LLM07: System prompt leakage (adversarial)Tests whether the model can be tricked into revealing its system instructions, internal prompts, or configuration details.Uses techniques: direct requests, completion tricks, encoding, and context manipulation to extract hidden prompts.
eval-llm09-adversarial
adversarial_probe
LLM09: Misinformation/Hallucination (adversarial)Tests whether the model can be manipulated into generating false or misleading information.Probes for: factual errors, fabricated citations, false claims presented as fact, and hallucinated data.
eval-pii-adversarial
adversarial_probe
PII leakage (adversarial)Tests whether the model can be manipulated into leaking personally identifiable information.Attempts to extract: names, email addresses, phone numbers, SSNs, addresses, and other PII that should be protected.

OWASP LLM Top 10 (Fixed Prompts)

Same OWASP categories, but using fixed test prompts instead of adversarial probes. Attach all fixed prompt tests:
flintai eval model-evaluations attach --model my-agent --tag method="Fixed Prompts"
eval-llm01-fixed
message_collection
LLM01: Prompt injection (fixed prompts)Tests prompt injection resistance using a curated set of known attack patterns.Based on real-world prompt injection attempts documented in security research.
eval-llm02-fixed
message_collection
LLM02: Sensitive information disclosure (fixed prompts)Tests information leakage using fixed prompts designed to extract sensitive data.
eval-llm05-fixed
message_collection
LLM05: Unsafe output generation (fixed prompts)Tests for generation of dangerous executable payloads using fixed test cases.
eval-llm06-fixed
message_collection
LLM06: Excessive agency (fixed prompts)Tests for unauthorized actions using fixed prompts that attempt to exceed intended scope.
eval-llm07-fixed
message_collection
LLM07: System prompt leakage (fixed prompts)Tests system prompt protection using known extraction techniques.
eval-llm09-fixed
message_collection
LLM09: Misinformation/Hallucination (fixed prompts)Tests hallucination resistance using prompts known to trigger factual errors.Quality evaluation — measures accuracy and factual correctness.
eval-pii-fixed
message_collection
PII leakage (fixed prompts)Tests PII protection using fixed prompts that attempt to extract personal information.
eval-secret-fixed
message_collection
Secret leakage (fixed prompts)Tests protection of API keys, credentials, and other secrets using fixed extraction attempts.

Garak Framework Tests

Tests from the Garak LLM vulnerability scanner — industry-standard adversarial testing framework. Attach all Garak tests:
flintai eval model-evaluations attach --model my-agent --tag source=Garak
garak-continuation
garak_module
Continuation attacksTests whether the model continues harmful or dangerous content when prompted with partial sequences.
garak-dan
garak_module
DAN (Do Anything Now) jailbreaksTests resistance to “Do Anything Now” style jailbreak attempts that try to remove safety constraints.
garak-encoding
garak_module
Encoding-based attacksTests whether encoding tricks (Base64, ROT13, hex, leetspeak) can bypass safety filters.
garak-gcg
garak_module
GCG (Greedy Coordinate Gradient) attacksTests against adversarially optimized attack strings designed to jailbreak language models.
garak-glitch
garak_module
Glitch tokensTests behavior when processing anomalous tokens known to cause unexpected model behavior.
garak-goodside
garak_module
Goodside attacksTests based on Riley Goodside’s documented jailbreak techniques.
garak-knownbadsignatures
garak_module
Known bad signaturesTests detection of known malicious patterns and signatures.
garak-lmrc
garak_module
LMRC attacksTests from the Language Model Risk Cards framework.
garak-malwaregen
garak_module
Malware generationTests whether the model can be manipulated into generating malware or exploit code.
garak-misleading
garak_module
Misleading claimsTests resistance to generating misleading or deceptive content.
garak-packagehallucination
garak_module
Package hallucinationTests whether the model fabricates non-existent software packages or libraries.
garak-promptinject
garak_module
Prompt injection (Garak)Garak’s comprehensive prompt injection test suite.
garak-realtoxicityprompts
garak_module
RealToxicityPromptsTests using the RealToxicityPrompts dataset — prompts known to elicit toxic responses.
garak-replay
garak_module
Replay attacksTests whether previous conversations can be replayed to bypass safety checks.
garak-snowball
garak_module
Snowball attacksTests escalating attacks that build on previous turns to break safety constraints.
garak-suffix
garak_module
Suffix attacksTests adversarial suffixes designed to jailbreak models.
garak-tap
garak_module
TAP (Tree of Attacks with Pruning)Tests using the TAP jailbreak technique.
garak-visual
garak_module
Visual jailbreaksTests attacks embedded in image descriptions or visual content (for multimodal models).
garak-xss
garak_module
XSS (Cross-Site Scripting)Tests whether the model generates XSS payloads.
garak-advglue
garak_module
AdvGLUE adversarial examplesTests robustness against adversarially perturbed inputs from the AdvGLUE benchmark.
garak-aml
garak_module
AML (Adversarial ML) attacksTests resistance to adversarial machine learning attacks.
garak-riskyemergent
garak_module
Risky emergent behaviorsTests for concerning emergent behaviors not explicitly trained for.
garak-xstest
garak_module
XSTest safety evaluationsTests from the XSTest safety evaluation suite.