Skip to main content

Architecture Overview

dtx is a modular AI Red Teaming Framework designed for prompt generation, AI model testing, and response evaluation. The system follows a plug-and-play architecture where different components can be swapped as needed.


1. Architecture Componentsโ€‹

1. Generators (Prompt Generation)โ€‹

  • Responsible for creating adversarial prompts to test AI models.
  • Internally referred to as datasets.
  • Provide input to targets for evaluation.

Examples:

  • STRINGRAY โ€“ Requires OpenAI API
  • STARGAZER โ€“ Local execution
  • HF_LMSYS, HF_HACKAPROMPT โ€“ Hugging Face datasets

2. Targets (Models or APIs Being Tested)โ€‹

  • Receive prompts from generators and execute them.
  • Can be local models, cloud-based AI APIs, or custom AI applications.
  • Defined using providers, which specify how to interact with different AI systems.

Examples:

  • Hugging Face Models โ€“ HF_MODEL
  • Ollama Models โ€“ llama-guard3, mistral
  • OpenAI API โ€“ Uses OpenAI models
  • HTTP-based AI endpoints โ€“ Custom API-based targets
  • ECHO โ€“ A dummy agent for testing workflows

3. Evaluators (Security Analysis)โ€‹

  • Analyze AI responses for jailbreak detection, prompt injection, toxicity, and security vulnerabilities.
  • Compare AI model outputs against security policies.
  • Can be AI-driven evaluators (Hugging Face, Ollama) or rule-based evaluators (JSON path matching, keyword detection).

Examples:

  • Hugging Face Evaluators โ€“ detoxify, GraniteToxicityEvaluator
  • Ollama LLM Evaluators โ€“ llama-guard3
  • JSON-based Evaluators โ€“ AnyJsonPathExpressionMatch

2. dtx Workflow (Diagram)โ€‹

Below is a stick diagram showing how different components interact:

           +------------------+
| Generators | <--- Plug & Play Datasets: STRINGRAY, STARGAZER, HF_LMSYS
+------------------+
|
v
+------------------+
| Targets | <--- Plug & Play Providers: Hugging Face, OpenAI API, Ollama, HTTP
+------------------+
|
v
+------------------+
| Evaluators | <--- Plug & Play Options: Hugging Face, JSON Rules, LLM Evaluators
+------------------+

Each component is modular, meaning generators, targets, and evaluators can be easily swapped.


3. Plug & Play Options in dtxโ€‹

ComponentDescriptionPlug & Play Options
GeneratorsCreate adversarial promptsDatasets: STRINGRAY, STARGAZER, HF_LMSYS, HF_HACKAPROMPT
TargetsAI models or APIs being testedProviders: Hugging Face, OpenAI API, Ollama, HTTP
EvaluatorsAnalyze responses for security risksHugging Face models, JSON-based rules, LLM Evaluators