DNAS-Bench: Deterministic Nucleic Acid Screener Benchmarking

DNAS-Bench is a deterministic benchmarking framework for evaluating the robustness of Biosecurity Screening Software (BSS) against adversarial nucleic acid sequence manipulations.

Overview

Commercial and open-source DNA synthesis providers use Biosecurity Screening Software to detect potentially dangerous synthesis orders. DNAS-Bench stress-tests these screeners by applying a systematic suite of deterministic manipulations to known sequences-of-concern — no AI required, no BSS code access needed.

Key findings from our paper:

SeqScreen detects 41.99% of manipulated sequences on average; Commec detects 10.19%
Splitting sequences below 50 bp bypasses most existing safeguards
Simple methods (e.g., appending a repeated nucleotide at 1.5× length) perform nearly as well as complex ones — differing by only 0.75 percentage points on average

Manipulation Methods

Method	Description	Difficulty
Split	Fragment sequence into non-overlapping subsequences of fixed length	Easy
Encapsulate	Add 4 bp random flanking sequences to both ends of each fragment	Easy
Cover (GFP)	Append a GFP gene subsequence at 1.5× fragment length to tail	Medium
Cover-A	Append a poly-adenine tail at 1.5× fragment length	Easy
Cover-Random	Append a random genomic sequence at 1.5× fragment length	Easy
Flip	Introduce random point mutations at a specified fraction of positions	Medium

Quick Start

Prerequisites

No external dependencies — all scripts use the Python standard library only (Python 3.6+).

Step 1 — Split and Encapsulate

split-sequence.py fragments your sequence at a fixed length and produces two FASTA files in one run:

-o → encapsulate dataset: fragments with Golden Gate tails (enzyme recognition site + 4 bp overhangs)
--split-output → splits dataset: payload-only fragments (no tails), used as input for covers and flips

python split-sequence.py \
    -i sequence.fasta \
    -l 100 \
    -o output/encapsulate/fragments_L100.fasta \
    --split-output output/splits/fragments_L100.fasta

Repeat for each desired length (e.g. 50, 100, 150, 200, 250, 300 bp).

Step 2 — Covers (append benign sequence)

split_and_append.py takes a payload FASTA from Step 1 and appends a covering sequence at 1.5× the fragment length.

# Cover with a donor FASTA (e.g. GFP)
python split_and_append.py \
    -a output/splits/fragments_L100.fasta \
    --second gfp.fasta --wrap-second \
    --frag-len 100 -o output/covers_gfp/ --single-file

# Cover with poly-A tail
python split_and_append.py \
    -a output/splits/fragments_L100.fasta \
    --poly-a \
    --frag-len 100 -o output/covers_poly_a/ --single-file

# Cover with random sequence
python split_and_append.py \
    -a output/splits/fragments_L100.fasta \
    --random \
    --frag-len 100 -o output/covers_random/ --single-file

Step 3 — Flipped Splits (point mutations)

flip_splits.py takes a directory of FASTA files (e.g. your splits output folder) and mutates a specified percentage of bases in each fragment.

python flip_splits.py \
    output/splits/ \
    output/flipped_splits/ \
    5.0 \
    --seed 42

Repository Structure

DNAS-Bench/
├── split-sequence.py        # Produces encapsulate (tailed) and splits (payload-only) datasets
├── split_and_append.py      # Covers: append donor FASTA, poly-A, or random sequence at 1.5x length
├── flip_splits.py           # Flipped splits: introduce random point mutations across a fragment directory
├── gfp.fasta                # GFP donor sequence used with split_and_append.py --second
└── README.md

Output Format

Running split-sequence.py at length 100 produces two files:

output/
├── encapsulate/
│   └── fragments_L100.fasta            # Golden Gate tails included → feed directly to BSS
├── splits/
│   └── fragments_L100.fasta            # payload only → input for covers and flips
├── covers_gfp/
│   └── <prefix>_combined_fragments_L100.fasta
├── covers_poly_a/
├── covers_random/
└── flipped_splits/

Fragment headers in the encapsulate output follow --name-format:

>{orig}|frag{index}|{len}bp|{left}-{right}

Fragment headers in the splits output follow --split-name-format:

>{orig}|frag{index}|{len}bp|split

Threat Model

DNAS-Bench models an adversary who:

Obtains a regulated malicious sequence (e.g., from a public database)
Applies deterministic manipulations — no BSS code access or molecular biology expertise required
Submits fragments across multiple thin clients / accounts to evade order-pattern detection
Reassembles fragments in a lab after delivery

We focus empirically on evasion of automated screening (steps 1–3), and explicitly separate this from the downstream biological reconstruction challenge (step 4).

Dataset

We evaluated 11 sequences derived from the HHS and USDA Select Agents and Toxins List, ranging from 753 bp (single toxin gene) to 5.2 Mb (bacterial chromosomal locus). Agent identities are de-identified in public releases.

To request the benchmark dataset, please contact the authors. We share data with BSS developers and researchers working to improve screening pipelines.

Results Summary

BSS	Mean Detection	Detection at L=300
SeqScreen	41.99%	49.57%
Commec	10.19%	28.72%
Kraken (baseline)	40.08%	49.77%

Detection collapses for all tools when fragment length falls below 50 bp.

Citation

@inproceedings{wong2025dnasbench,
  title     = {DNAS-Bench: Deterministic Nucleic Acid Screener Benchmarking},
  author    = {Wong, Henry C. and Kohno, Tadayoshi and Nivala, Jeff},
  booktitle = {Workshop on Cybersecurity for Biology (CyberBio)},
  year      = {2026}
}

Ethical Considerations

This framework is intended for BSS developers, biosecurity researchers, and DNA synthesis companies to evaluate and improve screening pipelines.

Manipulated benchmark data is available on request only — not open-sourced
Agent and toxin identities in public materials are de-identified
We have notified and shared results with the developers of SeqScreen and Commec
We do not believe executing a real attack from public materials alone is tractable

See ETHICS.md for full discussion.

Contact

Henry C. Wong — University of Washington
For dataset access or collaboration inquiries, please open an issue or reach out directly.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data_manipulation		data_manipulation
README.md		README.md
run_all_manipulations.py		run_all_manipulations.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DNAS-Bench: Deterministic Nucleic Acid Screener Benchmarking

Overview

Manipulation Methods

Quick Start

Prerequisites

Step 1 — Split and Encapsulate

Step 2 — Covers (append benign sequence)

Step 3 — Flipped Splits (point mutations)

Repository Structure

Output Format

Threat Model

Dataset

Results Summary

Citation

Ethical Considerations

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DNAS-Bench: Deterministic Nucleic Acid Screener Benchmarking

Overview

Manipulation Methods

Quick Start

Prerequisites

Step 1 — Split and Encapsulate

Step 2 — Covers (append benign sequence)

Step 3 — Flipped Splits (point mutations)

Repository Structure

Output Format

Threat Model

Dataset

Results Summary

Citation

Ethical Considerations

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages