An intelligent penetration testing framework that probes AI chatbots with adversarial payloads, evaluates responses using an LLM-as-a-Judge, and maps every finding directly to specific articles in Regulation (EU) 2024/1689.
The Challenge
The EU AI Act (Regulation (EU) 2024/1689) is the world's first comprehensive legal framework for artificial intelligence. It imposes binding obligations on AI systems deployed across the European Union — including transparency, robustness, and non-discrimination requirements.
The problem: no automated tool existed to test whether a chatbot actually meets these obligations. Compliance audits were entirely manual, slow, and inconsistent.
This scanner automates that process — firing adversarial payloads, judging responses with AI, and mapping every violation to the exact legal clause it breaks.
What Gets Tested
Mapped to the OWASP Top 10 for LLMs and directly linked to specific EU AI Act articles.
The Process
From clicking Run Scan to receiving a full audit report with legal citations.
Under the Hood
Each mechanism adds a layer of attack sophistication — from reproducible baseline tests to AI-generated adaptive probes.
The scanner opens with 16 curated adversarial prompts — 4 per vulnerability category — sourced from three established academic and industry datasets.
These provide a reproducible, dataset-backed baseline: the same inputs every time, making results comparable across different targets and scan sessions.
Beyond the hardcoded baseline, GPT-4o-mini generates 3 additional payloads per category — tailored to the specific target chatbot and vulnerability type in real time.
This ensures adaptability: if a target has unique defences, the dynamic payloads are contextually crafted to probe those specific weaknesses rather than relying solely on generic dataset prompts.
For targets with no public API — like the DVLA chatbot — a headless Chrome browser is automated using Selenium WebDriver to interact with the real web interface directly.
The adapter handles full page lifecycle: refresh, DOM stabilisation, textarea detection, spinner monitoring, and stability polling to ensure the chatbot's full response is captured before moving on.
Single-turn probes are often detected and blocked by modern chatbots. Multi-turn attacks are more realistic — they mimic how a real attacker would gradually build trust and escalate.
Each strategy uses 3 turns, with each message building on the previous response to steer the chatbot progressively toward a violation.
After each probe, GPT-4o-mini acts as an expert compliance auditor — evaluating the chatbot's response against the EU AI Act's requirements and returning a structured verdict.
Using an LLM judge eliminates the need for hand-coded detection rules. It can reason about nuanced, ambiguous, or context-dependent responses that pattern matching would miss.
Supported Targets
The scanner adapts to different chatbot architectures using purpose-built adapter classes.
POST {"message":"…"} and returns {"response":"…"}. Optional Bearer token auth support.Legal Mapping
Every finding is automatically linked to the article of Regulation (EU) 2024/1689 that it violates — using semantic embedding similarity, not keyword matching.
| Vulnerability | OWASP Category | EU AI Act Article | Legal Obligation | Detection Method |
|---|---|---|---|---|
| Prompt Injection | LLM01 | ↗ Article 15 | AI must be resilient to adversarial manipulation of its outputs or behaviour | Hardcoded + Dynamic + Multi-turn + Judge |
| Data Leakage | LLM02 | ↗ Article 10 | AI must not expose sensitive operational or personal data from its training or context | Hardcoded + Dynamic + Multi-turn + Judge |
| Identity Disclosure | LLM07 | ↗ Article 50 | Users must always be informed they are interacting with an AI system, not a human | Hardcoded + Dynamic + Multi-turn + Judge |
| Bias / Hallucination | LLM09 | ↗ Art. 15 & 10(4) | AI must not produce discriminatory outputs or assert false information as established fact | Hardcoded + Dynamic + Multi-turn + Judge |
Risk Assessment
Each failing probe contributes threat points based on severity. Multi-turn final turns receive a ×1.5 multiplier to reflect the higher realism of escalating attacks. The final score is normalised to 0–100.
Dashboard Preview
Real-time terminal log streamed via Server-Sent Events — and an audit report card generated automatically.
Built With