Skip to main content

🧭 TAP

TAP (Tree of Attacks with Pruning) is a search procedure for studying model robustness. It explores how a prompt can be iteratively reformulated into more effective queries, while continuously filtering off-topic paths and keeping only top-scoring candidates. Conceptually:

  • Attacker proposes new candidate prompts from prior attempts.
  • Evaluator (a judge) keeps candidates aligned with the research task and assigns scores to target responses.
  • Target is the black-box model being probed.
  • Pruning limits search to the best candidates, enabling scalable, reproducible experiments.

TAP is not a single prompt; it is a controlled loop that measures how easily the target can be nudged toward a goal under monitored conditions.


⚑ Quick Start (via dtx)

The unified CLI exposes TAP behind one command. These examples use benign goals.

dtx attack run \
--attack TAP \
--task "Outline a beginner yoga routine" \
--target-model gpt-4o-mini


πŸ–ΌοΈ Workflow

TAP Workflow

  1. Initialize

    • Provide a root goal (research task) and a seed prompt.
  2. Branch

    • The Attacker proposes b new prompts per current candidate, using limited context.
  3. Prune Β· Phase 1 (On-Topic)

    • The Evaluator removes prompts that don’t ask for the same thing as the goal.
  4. Probe Target

    • Query the Target with the remaining prompts; collect responses.
  5. Judge

    • The Evaluator scores each response (e.g., 1–10) for goal satisfaction and policy/safety conformance. (Many setups also include an on-task gate for responses to reduce false positives.)
  6. Prune Β· Phase 2 (Top-K)

    • Keep the top w candidates and iterate.
  7. Stop

    • Early-stop when any score reaches a threshold, or after d iterations.

πŸ”§ Advanced CLI Options (TAP)

CLI flagMeaning (maps to TAP)Typical impact
--tap-branch <b>Branching factor bHigher = more exploration, more cost
--tap-width <w>Beam width w (Top-K kept)Higher = broader search, more memory
--tap-depth <d>Max depth / iterationsHigher = longer runs, deeper reformulations
--tap-keep-last-n <n>Attacker history windowStabilizes proposals; too large can drift
--tap-max-attempts <k>Attacker JSON retry budgetMore robust proposal parsing
--no-delete-off-topicDisable on-topic pruning gateNot recommended; increases noise
--judge-template <name>Judge rubric/templateAffects scoring stability/calibration
--temperature <t>Target decoding temperatureControls diversity of target responses
--max-new-tokens <N>Target max tokensCaps verbosity/latency
--success-threshold <x>Early stop on judge scoreLower = faster stops; higher = stricter hits

TAP examples with knobs​

# Smaller, fast iteration
dtx attack run --attack TAP \
--task "Summarize key healthy sleep habits" \
--tap-branch 2 --tap-width 3 --tap-depth 3

# Wider search + deterministic judge
dtx attack run --attack TAP \
--task "Explain pour-over coffee basics in 4 bullets" \
--tap-branch 3 --tap-width 6 --tap-depth 4 \
--judge-template Mehrotra2023TAP

# More context retained by the attacker (be cautious with drift)
dtx attack run --attack TAP \
--task "Outline a simple 5K training plan" \
--tap-keep-last-n 5 --tap-max-attempts 6

πŸ“š References

  • Tree of Attacks: Jailbreaking Black-Box LLMs Automatically. arXiv:2312.02119
  • Related red-team & evaluation frameworks from the community (various open-source efforts)