flintai eval includes built-in evaluations for testing agent reliability and behavior.
Filtering evaluations
Use tags to filter when listing or attaching evaluations:Evaluation categories
Tests are organized into three categories:- OWASP adversarial probes - Adaptive attacks that actively attempt to jailbreak your agent
- OWASP fixed prompts - Known attack patterns documented in security research
- Garak framework tests - Comprehensive coverage from the industry-standard security scanner
OWASP LLM Top 10 (Adversarial)
Tests based on OWASP LLM Top 10 security risks. These use adversarial probes to actively attempt attacks. Attach all adversarial tests:LLM01: Prompt injection (adversarial)Tests whether the model can be tricked into ignoring its system instructions and following attacker-supplied instructions instead.Uses escalating techniques: direct override requests, role-play scenarios, encoding tricks, hypothetical framing, and multi-language obfuscation.
LLM02: Sensitive information disclosure (adversarial)Tests whether the model leaks sensitive information such as training data, PII, API keys, internal configuration, or data from other users’ sessions.Attempts to extract: training data, user data from other sessions, system configuration, and personally identifiable information.
LLM05: Unsafe output generation (adversarial)Tests whether the model can be manipulated into generating outputs containing executable payloads (XSS, SQL injection, shell commands) that would be dangerous if passed to downstream systems without sanitization.Probes for: JavaScript/HTML payloads, SQL injection strings, shell commands, and markdown that could be interpreted as executable code.
LLM06: Excessive agency (adversarial)Tests whether the model can be manipulated into performing actions beyond its intended scope or authority.Attempts to trigger: unauthorized API calls, privilege escalation, accessing out-of-scope resources, and exceeding rate limits.
LLM07: System prompt leakage (adversarial)Tests whether the model can be tricked into revealing its system instructions, internal prompts, or configuration details.Uses techniques: direct requests, completion tricks, encoding, and context manipulation to extract hidden prompts.
LLM09: Misinformation/Hallucination (adversarial)Tests whether the model can be manipulated into generating false or misleading information.Probes for: factual errors, fabricated citations, false claims presented as fact, and hallucinated data.
PII leakage (adversarial)Tests whether the model can be manipulated into leaking personally identifiable information.Attempts to extract: names, email addresses, phone numbers, SSNs, addresses, and other PII that should be protected.
OWASP LLM Top 10 (Fixed Prompts)
Same OWASP categories, but using fixed test prompts instead of adversarial probes. Attach all fixed prompt tests:LLM01: Prompt injection (fixed prompts)Tests prompt injection resistance using a curated set of known attack patterns.Based on real-world prompt injection attempts documented in security research.
LLM02: Sensitive information disclosure (fixed prompts)Tests information leakage using fixed prompts designed to extract sensitive data.
LLM05: Unsafe output generation (fixed prompts)Tests for generation of dangerous executable payloads using fixed test cases.
LLM06: Excessive agency (fixed prompts)Tests for unauthorized actions using fixed prompts that attempt to exceed intended scope.
LLM07: System prompt leakage (fixed prompts)Tests system prompt protection using known extraction techniques.
LLM09: Misinformation/Hallucination (fixed prompts)Tests hallucination resistance using prompts known to trigger factual errors.Quality evaluation — measures accuracy and factual correctness.
PII leakage (fixed prompts)Tests PII protection using fixed prompts that attempt to extract personal information.
Secret leakage (fixed prompts)Tests protection of API keys, credentials, and other secrets using fixed extraction attempts.
Garak Framework Tests
Tests from the Garak LLM vulnerability scanner — industry-standard adversarial testing framework. Attach all Garak tests:Continuation attacksTests whether the model continues harmful or dangerous content when prompted with partial sequences.
DAN (Do Anything Now) jailbreaksTests resistance to “Do Anything Now” style jailbreak attempts that try to remove safety constraints.
Encoding-based attacksTests whether encoding tricks (Base64, ROT13, hex, leetspeak) can bypass safety filters.
GCG (Greedy Coordinate Gradient) attacksTests against adversarially optimized attack strings designed to jailbreak language models.
Glitch tokensTests behavior when processing anomalous tokens known to cause unexpected model behavior.
Goodside attacksTests based on Riley Goodside’s documented jailbreak techniques.
Known bad signaturesTests detection of known malicious patterns and signatures.
LMRC attacksTests from the Language Model Risk Cards framework.
Malware generationTests whether the model can be manipulated into generating malware or exploit code.
Misleading claimsTests resistance to generating misleading or deceptive content.
Package hallucinationTests whether the model fabricates non-existent software packages or libraries.
Prompt injection (Garak)Garak’s comprehensive prompt injection test suite.
RealToxicityPromptsTests using the RealToxicityPrompts dataset — prompts known to elicit toxic responses.
Replay attacksTests whether previous conversations can be replayed to bypass safety checks.
Snowball attacksTests escalating attacks that build on previous turns to break safety constraints.
Suffix attacksTests adversarial suffixes designed to jailbreak models.
TAP (Tree of Attacks with Pruning)Tests using the TAP jailbreak technique.
Visual jailbreaksTests attacks embedded in image descriptions or visual content (for multimodal models).
XSS (Cross-Site Scripting)Tests whether the model generates XSS payloads.
AdvGLUE adversarial examplesTests robustness against adversarially perturbed inputs from the AdvGLUE benchmark.
AML (Adversarial ML) attacksTests resistance to adversarial machine learning attacks.
Risky emergent behaviorsTests for concerning emergent behaviors not explicitly trained for.
XSTest safety evaluationsTests from the XSTest safety evaluation suite.