Skip to main content

🎯 BON

BON (Best-of-N with Augmented Variants) is a simple, effective optimizer for a single prompt.
Instead of branching trees or iterative refinements, BON generates N augmented variants of a root prompt, evaluates the target’s responses with a judge, and keeps the best (optionally across multiple rounds).

Conceptually:

  • Generator (augmenter) produces N prompt variants from a base prompt using a controllable augmentation strength Οƒ.
  • Target is the black-box model you’re probing.
  • Evaluator (judge) assigns a numeric score (e.g., 1–10) based on goal satisfaction (and optionally safety/policy criteria).
  • Selection picks the best-scoring (prompt, response) pair per round. Repeat for R rounds if desired.

BON is not a jailbreak method; it’s a prompt optimizer for allowed objectives under controlled scoring.


⚑ Quick Start

Run BON from the unified CLI with one simple command:

dtx attack run --attack BON --task "Summarize the benefits of regular exercise in 5 concise bullet points"

πŸ–ΌοΈ Workflow

BON Workflow

  1. Initialize Provide a base prompt (your task). Decide N (candidates per round), R (rounds), and augmentation strength Οƒ.

  2. Augment Produce N prompt variants from the base using controlled transformations (rewording, constraints, ordering).

  3. Probe Target Send each variant to the Target; collect responses.

  4. Judge The Evaluator returns a numeric score per response for goal satisfaction (and optionally policy/safety conformance).

  5. Select Keep the best-scoring (prompt, response) this round. Optionally seed the next round’s augmentations from the current best.

  6. Stop After R rounds, or early if a success threshold is met.


πŸ”§ Advanced CLI Options (BON)

CLI flagMeaningTypical impact
--bon-candidates NCandidates per round (N)Higher = better search coverage, higher cost
--bon-rounds RNumber of rounds (R)More rounds = iterative polishing
--bon-sigma ΟƒAugmentation strength (0–1)Low = subtle edits; high = diverse, risk drift
--bon-seed SEEDRNG seedReproducibility of augmentations
--eval-model NAMEJudge modelAffects scoring stability/calibration
--target-model NAMETarget modelThe system under test
--temperature TTarget sampling temperatureControls response diversity
--max-new-tokens NTarget generation capControls verbosity/latency
--success-threshold xEarly stop on judge scoreLower = earlier stops; higher = stricter hits
--judge-template NAMEJudge prompt templateScoring rubric and style

Example with knobs​

dtx attack run --attack BON \
--task "List 5 ergonomic tips for desk workers" \
--bon-candidates 12 --bon-rounds 2 --bon-sigma 0.35

When to Use BON vs. TAP or PAIR

  • BON: You have a single prompt and want quick wins via breadth (N variants).
  • PAIR: You want iterative refinement along a few linear tracks.
  • TAP: You need deeper search with branching and pruning.

πŸ“š References

  • BEST-OF-N JAILBREAKING arXiv:2412.03556 (BON)