Architecture Overview
dtx is a modular AI Red Teaming Framework designed for prompt generation, AI model testing, and response evaluation. The system follows a plug-and-play architecture where different components can be swapped as needed.
1. Architecture Componentsโ
1. Generators (Prompt Generation)โ
- Responsible for creating adversarial prompts to test AI models.
- Internally referred to as datasets.
- Provide input to targets for evaluation.
Examples:
- STRINGRAY โ Requires OpenAI API
- STARGAZER โ Local execution
- HF_LMSYS, HF_HACKAPROMPT โ Hugging Face datasets
2. Targets (Models or APIs Being Tested)โ
- Receive prompts from generators and execute them.
- Can be local models, cloud-based AI APIs, or custom AI applications.
- Defined using providers, which specify how to interact with different AI systems.
Examples:
- Hugging Face Models โ
HF_MODEL
- Ollama Models โ
llama-guard3
,mistral
- OpenAI API โ Uses OpenAI models
- HTTP-based AI endpoints โ Custom API-based targets
- ECHO โ A dummy agent for testing workflows
3. Evaluators (Security Analysis)โ
- Analyze AI responses for jailbreak detection, prompt injection, toxicity, and security vulnerabilities.
- Compare AI model outputs against security policies.
- Can be AI-driven evaluators (Hugging Face, Ollama) or rule-based evaluators (JSON path matching, keyword detection).
Examples:
- Hugging Face Evaluators โ
detoxify
,GraniteToxicityEvaluator
- Ollama LLM Evaluators โ
llama-guard3
- JSON-based Evaluators โ
AnyJsonPathExpressionMatch
2. dtx Workflow (Diagram)โ
Below is a stick diagram showing how different components interact:
+------------------+
| Generators | <--- Plug & Play Datasets: STRINGRAY, STARGAZER, HF_LMSYS
+------------------+
|
v
+------------------+
| Targets | <--- Plug & Play Providers: Hugging Face, OpenAI API, Ollama, HTTP
+------------------+
|
v
+------------------+
| Evaluators | <--- Plug & Play Options: Hugging Face, JSON Rules, LLM Evaluators
+------------------+
Each component is modular, meaning generators, targets, and evaluators can be easily swapped.
3. Plug & Play Options in dtxโ
Component | Description | Plug & Play Options |
---|---|---|
Generators | Create adversarial prompts | Datasets: STRINGRAY, STARGAZER, HF_LMSYS, HF_HACKAPROMPT |
Targets | AI models or APIs being tested | Providers: Hugging Face, OpenAI API, Ollama, HTTP |
Evaluators | Analyze responses for security risks | Hugging Face models, JSON-based rules, LLM Evaluators |