Quick Start
Install dtx
as a library and call it from your Python code.
Project Setupβ
To start a new project with dtx
, you can use Poetry or uv for dependency management.
Option 1: Using Poetryβ
# Create a new Python project
poetry new dtx_test
# Navigate into the project directory
cd dtx_test
# Add dtx with the torch extra as a dependency
poetry add "dtx[torch]"
Option 2: Using uv (Fast Alternative to pip/Poetry)β
uv
is a fast Python package/dependency manager and pip
replacement.
π¦ To install
uv
, follow the instructions here: π https://github.com/astral-sh/uv#installation
Once installed:
# Create a new directory manually
mkdir dtx_test && cd dtx_test
# Initialize a uv project
uv venv
# Add dtx with torch extra
uv pip install "dtx[torch]"
β
pip install
will always work in any Python environment:pip install "dtx[torch]"
π‘ If you already have
torch
installed in your environment, you can installdtx
without the extra:pip install dtx
Basic Usageβ
from dtx.sdk.runner import DtxRunner, DtxRunnerConfigBuilder
from dtx.plugins.providers.dummy.echo import EchoAgent
# Build the runner configuration
cfg = (
DtxRunnerConfigBuilder()
.agent(EchoAgent()) # toy agent for demos
.max_prompts(5) # limit generated prompts
.build()
)
# Execute a red-team evaluation
runner = DtxRunner(cfg)
report = runner.run()
# Inspect the results
print(report.model_dump_json(indent=2))
Filtering with Plugins and Frameworksβ
You can scope the evaluation to run tests from specific plugins or those mapped to AI safety frameworks.
Filter by Pluginβ
Use .plugins()
to select tests by plugin ID, keyword, or a regular expression. This is useful for focusing on specific attack vectors like "jailbreak" or "prompt injection".
# Example: Run only tests from specific plugins
cfg_plugins = (
DtxRunnerConfigBuilder()
.agent(EchoAgent())
.max_prompts(10)
# Select plugins by name, keyword, or regex
.plugins(["jailbreak", "pi-.*"]) # prompt-injection patterns
.build()
)
report_plugins = DtxRunner(cfg_plugins).run()
print(f"Generated {len(report_plugins.eval_results)} test cases from selected plugins.")
Filter by Frameworkβ
Use .frameworks()
to select tests mapped to standards like MITRE ATLASβ’ or NIST. This helps in assessing compliance or risk against established benchmarks.
# Example: Run only tests mapped to the MITRE ATLAS framework
cfg_frameworks = (
DtxRunnerConfigBuilder()
.agent(EchoAgent())
.max_prompts(10)
# Select tests by framework name or pattern
.frameworks(["mitre-atlas"])
.build()
)
report_frameworks = DtxRunner(cfg_frameworks).run()
print(f"Generated {len(report_frameworks.eval_results)} test cases from the MITRE framework.")
Report Structureβ
The EvalReport
object returned by runner.run()
includes:
-
eval_results (
List[EvalResult]
)-
run_id (
str
): Unique identifier for each test case -
prompt (
MultiTurnTestPrompt
): The test prompt with user turns -
evaluation_method (
EvaluatorInScope
): Evaluator configuration -
module_name (
str
): Source dataset or module name -
policy, goal, strategy, base_prompt (
str
): Test metadata -
responses (
List[ResponseEvaluationStatus]
):- response (
BaseMultiTurnAgentResponse
orstr
): The agent's reply (with.turns
for multi-turn) - scores (optional): Evaluation scores
- policy, goal: Metadata repeated at response level
- success (
bool
): Pass/fail flag for the evaluation - description (
str
): Evaluator summary - attempts (
AttemptsBenchmarkStats
): Retry statistics
- response (
-
Now you have dtx
integrated as a Python library for red-team testing in your own project.
Print the Reportβ
After running your evaluation, you can easily view the results:
# Print the full report as JSON
print(report.model_dump_json(indent=2))