Skip to main content

Quick Start

Install dtx as a library and call it from your Python code.

Project Setup​

To start a new project with dtx, you can use Poetry or uv for dependency management.

Option 1: Using Poetry​

# Create a new Python project
poetry new dtx_test

# Navigate into the project directory
cd dtx_test

# Add dtx with the torch extra as a dependency
poetry add "dtx[torch]"

Option 2: Using uv (Fast Alternative to pip/Poetry)​

uv is a fast Python package/dependency manager and pip replacement.

πŸ“¦ To install uv, follow the instructions here: πŸ‘‰ https://github.com/astral-sh/uv#installation

Once installed:

# Create a new directory manually
mkdir dtx_test && cd dtx_test

# Initialize a uv project
uv venv

# Add dtx with torch extra
uv pip install "dtx[torch]"

βœ… pip install will always work in any Python environment:

pip install "dtx[torch]"

πŸ’‘ If you already have torch installed in your environment, you can install dtx without the extra:

pip install dtx

Basic Usage​

from dtx.sdk.runner import DtxRunner, DtxRunnerConfigBuilder
from dtx.plugins.providers.dummy.echo import EchoAgent

# Build the runner configuration
cfg = (
DtxRunnerConfigBuilder()
.agent(EchoAgent()) # toy agent for demos
.max_prompts(5) # limit generated prompts
.build()
)

# Execute a red-team evaluation
runner = DtxRunner(cfg)
report = runner.run()

# Inspect the results
print(report.model_dump_json(indent=2))

Filtering with Plugins and Frameworks​

You can scope the evaluation to run tests from specific plugins or those mapped to AI safety frameworks.

Filter by Plugin​

Use .plugins() to select tests by plugin ID, keyword, or a regular expression. This is useful for focusing on specific attack vectors like "jailbreak" or "prompt injection".

# Example: Run only tests from specific plugins
cfg_plugins = (
DtxRunnerConfigBuilder()
.agent(EchoAgent())
.max_prompts(10)
# Select plugins by name, keyword, or regex
.plugins(["jailbreak", "pi-.*"]) # prompt-injection patterns
.build()
)
report_plugins = DtxRunner(cfg_plugins).run()
print(f"Generated {len(report_plugins.eval_results)} test cases from selected plugins.")

Filter by Framework​

Use .frameworks() to select tests mapped to standards like MITRE ATLASβ„’ or NIST. This helps in assessing compliance or risk against established benchmarks.

# Example: Run only tests mapped to the MITRE ATLAS framework
cfg_frameworks = (
DtxRunnerConfigBuilder()
.agent(EchoAgent())
.max_prompts(10)
# Select tests by framework name or pattern
.frameworks(["mitre-atlas"])
.build()
)
report_frameworks = DtxRunner(cfg_frameworks).run()
print(f"Generated {len(report_frameworks.eval_results)} test cases from the MITRE framework.")

Report Structure​

The EvalReport object returned by runner.run() includes:

  • eval_results (List[EvalResult])

    • run_id (str): Unique identifier for each test case

    • prompt (MultiTurnTestPrompt): The test prompt with user turns

    • evaluation_method (EvaluatorInScope): Evaluator configuration

    • module_name (str): Source dataset or module name

    • policy, goal, strategy, base_prompt (str): Test metadata

    • responses (List[ResponseEvaluationStatus]):

      • response (BaseMultiTurnAgentResponse or str): The agent's reply (with .turns for multi-turn)
      • scores (optional): Evaluation scores
      • policy, goal: Metadata repeated at response level
      • success (bool): Pass/fail flag for the evaluation
      • description (str): Evaluator summary
      • attempts (AttemptsBenchmarkStats): Retry statistics

Now you have dtx integrated as a Python library for red-team testing in your own project.

After running your evaluation, you can easily view the results:

# Print the full report as JSON
print(report.model_dump_json(indent=2))