Skip to main content

Quick Start

Install dtx as a library and call it from your Python code.

Project Setup

To start a new project with dtx, you can use Poetry or uv for dependency management.

Option 1: Using Poetry

# Create a new Python project
poetry new dtx_test

# Navigate into the project directory
cd dtx_test

# Add dtx with the torch extra as a dependency
poetry add "dtx[torch]"

Option 2: Using uv (Fast Alternative to pip/Poetry)

uv is a fast Python package/dependency manager and pip replacement.

📦 To install uv, follow the instructions here: 👉 https://github.com/astral-sh/uv#installation

Once installed:

# Create a new directory manually
mkdir dtx_test && cd dtx_test

# Initialize a uv project
uv venv

# Add dtx with torch extra
uv pip install "dtx[torch]"

pip install will always work in any Python environment:

pip install "dtx[torch]"

💡 If you already have torch installed in your environment, you can install dtx without the extra:

pip install dtx

Quickstart

from dtx.sdk.runner import DtxRunner, DtxRunnerConfigBuilder
from dtx.plugins.providers.dummy.echo import EchoAgent

# Build the runner configuration
env_cfg = (
DtxRunnerConfigBuilder()
.agent(EchoAgent()) # toy agent for demos
.max_prompts(5) # limit generated prompts
.build()
)

# Execute a red-team evaluation
runner = DtxRunner(env_cfg)
report = runner.run()

# Inspect the results
print(report.json(indent=2))

Report Structure

The EvalReport object returned by runner.run() includes:

  • eval_results (List[EvalResult])

    • run_id (str): Unique identifier for each test case

    • prompt (MultiTurnTestPrompt): The test prompt with user turns

    • evaluation_method (EvaluatorInScope): Evaluator configuration

    • module_name (str): Source dataset or module name

    • policy, goal, strategy, base_prompt (str): Test metadata

    • responses (List[ResponseEvaluationStatus]):

      • response (BaseMultiTurnAgentResponse or str): The agent's reply (with .turns for multi-turn)
      • scores (optional): Evaluation scores
      • policy, goal: Metadata repeated at response level
      • success (bool): Pass/fail flag for the evaluation
      • description (str): Evaluator summary
      • attempts (AttemptsBenchmarkStats): Retry statistics

Now you have dtx integrated as a Python library for red-team testing in your own project.

After running your evaluation, you can easily view the results:

# Print the full report as JSON
print(report.model_dump_json(indent=2))