zly 166af50219 Move digger reproduction env to tools/reproduction/
- Moved bttoxin_digger_v5_repro to tools/reproduction/bttoxin_digger
- Updated docker-compose.yml to point to the new location
- This declutters the root directory while preserving the reproduction environment
2026-01-17 13:11:16 +08:00
2025-11-21 20:29:03 +08:00
2026-01-13 17:26:23 +08:00
2026-01-13 17:26:23 +08:00
2025-10-13 21:05:00 +08:00
2026-01-13 17:26:23 +08:00

BtToxin Pipeline

Automated Bacillus thuringiensis toxin mining system using pixi-managed environments.

Quick Start

Prerequisites

  • pixi - Modern package manager for conda environments
  • Linux x86_64 (linux-64 platform)

Installation

  1. Install pixi (if not already installed):
# Linux/macOS
curl -fsSL https://pixi.sh/install.sh | bash

# Or via Homebrew
brew install pixi
  1. Clone and setup the project:
git clone <your-repo>
cd bttoxin-pipeline

# Install all environments (digger + pipeline)
pixi install

This creates two isolated environments:

  • digger: BtToxin_Digger with bioconda dependencies (perl, blast, etc.)
  • pipeline: Python analysis tools (pandas, matplotlib, seaborn)

Running the Pipeline

Run the complete analysis pipeline with a single command:

pixi run pipeline --fna tests/test_data/HAN055.fna

This executes three stages:

  1. Digger: BtToxin_Digger toxin mining
  2. Shotter: Toxin scoring and target prediction
  3. Plot: Heatmap generation and report creation

CLI Options

pixi run pipeline --fna <file> [options]

Options:
  --fna PATH              Input .fna file (required)
  --out_root PATH         Output directory (default: runs/<stem>_run)
  --toxicity_csv PATH     Toxicity data CSV (default: Data/toxicity-data.csv)
  --min_identity FLOAT    Minimum identity threshold 0-1 (default: 0.0)
  --min_coverage FLOAT    Minimum coverage threshold 0-1 (default: 0.0)
  --disallow_unknown_families  Exclude unknown toxin families
  --require_index_hit     Keep only hits with known specificity
  --lang {zh,en}          Report language (default: zh)
  --bttoxin_db_dir PATH   Custom bt_toxin database directory
  --threads INT           Number of threads (default: 4)

Examples

# Basic run with default settings
pixi run pipeline --fna tests/test_data/C15.fna

# Strict filtering for high-confidence results
pixi run pipeline --fna tests/test_data/HAN055.fna \
  --min_identity 0.50 --min_coverage 0.60 \
  --disallow_unknown_families --require_index_hit

# English report with custom output directory
pixi run pipeline --fna tests/test_data/HAN055.fna \
  --out_root runs/HAN055_strict --lang en

# Use custom database
pixi run pipeline --fna tests/test_data/HAN055.fna \
  --bttoxin_db_dir /path/to/custom/bt_toxin

Individual Stage Commands

Run stages separately when needed:

Digger Only

pixi run digger-only --fna <file> [options]

Options:
  --fna PATH              Input .fna file (required)
  --out_dir PATH          Output directory (default: runs/<stem>_digger_only)
  --bttoxin_db_dir PATH   Custom database directory
  --threads INT           Number of threads (default: 4)
  --sequence_type         Sequence type: nucl/orfs/prot/reads (default: nucl)

Example:

pixi run digger-only --fna tests/test_data/C15.fna --threads 8

Shotter (Scoring)

pixi run shotter [options]

Options:
  --toxicity_csv PATH     Toxicity data CSV
  --all_toxins PATH       All_Toxins.txt from Digger
  --output_dir PATH       Output directory
  --min_identity FLOAT    Minimum identity threshold
  --min_coverage FLOAT    Minimum coverage threshold
  --allow_unknown_families / --disallow_unknown_families
  --require_index_hit     Keep only indexed hits

Example:

pixi run shotter \
  --all_toxins runs/C15_run/digger/Results/Toxins/All_Toxins.txt \
  --output_dir runs/C15_run/shotter

Plot (Visualization)

pixi run plot [options]

Options:
  --strain_scores PATH    strain_target_scores.tsv from Shotter
  --toxin_support PATH    toxin_support.tsv (optional)
  --species_scores PATH   strain_target_species_scores.tsv (optional)
  --out_dir PATH          Output directory
  --cmap STRING           Colormap (default: viridis)
  --per_hit_strain NAME   Generate per-hit heatmap for specific strain
  --merge_unresolved      Merge other/unknown into unresolved
  --report_mode {summary,paper}  Report style (default: paper)
  --lang {zh,en}          Report language (default: zh)

Example:

pixi run plot \
  --strain_scores runs/C15_run/shotter/strain_target_scores.tsv \
  --toxin_support runs/C15_run/shotter/toxin_support.tsv \
  --out_dir runs/C15_run/shotter \
  --per_hit_strain C15 --lang en

Output Structure

After running the pipeline:

runs/<strain>_run/
├── stage/                    # Staged input file
│   └── <strain>.fna
├── digger/                   # BtToxin_Digger outputs
│   ├── Results/
│   │   └── Toxins/
│   │       ├── All_Toxins.txt
│   │       ├── <strain>.list
│   │       ├── <strain>.gbk
│   │       └── Bt_all_genes.table
│   └── BtToxin_Digger.log
├── shotter/                  # Shotter outputs
│   ├── strain_target_scores.tsv
│   ├── strain_scores.json
│   ├── toxin_support.tsv
│   ├── strain_target_species_scores.tsv
│   ├── strain_species_scores.json
│   ├── strain_target_scores.png
│   ├── strain_target_species_scores.png
│   ├── per_hit_<strain>.png
│   └── shotter_report_paper.md
├── logs/
│   └── digger_execution.log
└── pipeline_results.tar.gz   # Bundled results

Database Update

BtToxin_Digger's built-in database may be outdated. Use the latest from GitHub:

Update Steps

mkdir -p external_dbs
rm -rf external_dbs/bt_toxin tmp_bttoxin_repo

git clone --filter=blob:none --no-checkout https://github.com/liaochenlanruo/BtToxin_Digger.git tmp_bttoxin_repo
cd tmp_bttoxin_repo

git sparse-checkout init --cone
git sparse-checkout set BTTCMP_db/bt_toxin
git checkout master

cd ..
cp -a tmp_bttoxin_repo/BTTCMP_db/bt_toxin external_dbs/bt_toxin
rm -rf tmp_bttoxin_repo

The pipeline automatically detects external_dbs/bt_toxin if present.

Database Structure

external_dbs/bt_toxin/
├── db/                    # BLAST index files (required)
│   ├── bt_toxin.phr
│   ├── bt_toxin.pin
│   ├── bt_toxin.ps
│   └── ...
└── seq/                   # Source sequences (optional, for reference)
    └── bt_toxin*.fas

Input File Format

.fna files are FASTA-format nucleotide sequence files containing bacterial genome sequences:

>NZ_CP010088.1 Bacillus thuringiensis strain 97-27 chromosome, complete genome
TAATGTAACACCAGTAAATATTTCATTCATATATTCTTTTAACTGTATTTTATATTCTTTCTACTCTACAATTTCTTTTA
ACTGCCAATATGCATCTTCTAGCCAAGGGTGTAAAACTTTCAACGTGTCTTTTCTATCCCACAAATATGAAATATATGCA
...

Result Interpretation

Key Output Files

All_Toxins.txt - Complete toxin predictions with:

  • Strain, Protein ID, coordinates
  • SVM/BLAST/HMM predictions
  • Hit ID, alignment length, identity, E-value

strain_target_scores.tsv - Strain-level target predictions:

  • TopOrder: Most likely target insect order
  • TopScore: Confidence score (0-1)
  • Per-order scores for all target orders

toxin_support.tsv - Per-hit contribution details:

  • Individual toxin weights and contributions
  • Family classification and partner status

Toxin Rankings

  • Rank1: Highest confidence (identity ≥78%, coverage ≥80%)
  • Rank2-3: Moderate confidence
  • Rank4: Lowest confidence predictions

Target Orders

Common insect orders in predictions:

  • Lepidoptera: Moths and butterflies
  • Coleoptera: Beetles
  • Diptera: Flies and mosquitoes
  • Hemiptera: True bugs
  • Nematoda: Roundworms

Web Interface (Optional)

BtToxin Pipeline provides an optional web interface for easy task submission and monitoring.

Quick Start

# Start both frontend and backend services (recommended)
pixi run web-start

# Frontend: http://localhost:5173
# Backend:  http://localhost:8000

Or start services separately:

# Terminal 1: Start backend
pixi run api-dev

# Terminal 2: Start frontend
pixi run fe-dev

Using the Web Interface

  1. Open http://localhost:5173 in your browser
  2. Upload a .fna genome file
  3. Configure analysis parameters (optional)
  4. Click "Submit Task"
  5. You'll be redirected to /<task_id> page
  6. The page polls for status every 3 seconds
  7. When complete, download your results as .tar.gz

Task URL

After submission, your task URL will be:

http://localhost:5173/<task_id>

Save this URL to check results later. Results are available for 30 days.

Result Storage

Task results are stored in /data/jobs/{task_id}/:

  • input.fna - Your uploaded file
  • params.json - Task parameters
  • output/ - Pipeline output files
  • pipeline_results.tar.gz - Downloadable bundle

Result Retention

Results are automatically deleted after 30 days to free up storage space. Download important results before they expire.

Python Development Environment

For development work outside pixi:

uv venv --managed-python -p 3.12 --seed .venv
source .venv/bin/activate
uv pip install -e .

Running Tests

# Run property-based tests for pipeline
pixi run -e pipeline python -m pytest tests/test_pixi_runner.py -v

# Run frontend tests
pixi run fe-test

# Run backend tests
pixi run api-test

Project Structure

bttoxin-pipeline/
├── pixi.toml                 # Pixi environment configuration
├── pyproject.toml            # Python package configuration
├── scripts/                  # Core pipeline scripts
│   ├── run_single_fna_pipeline.py  # Main pipeline orchestrator
│   ├── run_digger_stage.py         # Digger-only stage
│   ├── bttoxin_shoter.py           # Toxin scoring module
│   ├── plot_shotter.py             # Visualization & reporting
│   └── pixi_runner.py              # PixiRunner class
├── bttoxin/                  # Python package (CLI entry point)
│   ├── __init__.py
│   ├── api.py
│   └── cli.py
├── Data/                     # Reference data
│   └── toxicity-data.csv     # BPPRC specificity data
├── external_dbs/             # External databases (optional)
│   └── bt_toxin/             # Updated BtToxin database
├── tests/                    # Test suite
│   ├── test_pixi_runner.py   # Property-based tests
│   └── test_data/            # Test input files
├── docs/                     # Documentation
├── runs/                     # Pipeline outputs (gitignored)
├── backend/                  # FastAPI backend (optional web service)
└── frontend/                 # Vue.js frontend (optional web UI)

Docker Deployment

For production deployment or easy setup without installing pixi/conda:

# 1. Build and start the service
docker compose -f docker/compose/docker-compose.simple.yml up -d

# 2. Access the services
# Frontend: http://localhost
# Backend API: http://localhost/api/docs

The Docker setup uses a single container with Nginx managing both frontend assets and backend API proxying.

Available Docker configurations:

  • docker/compose/docker-compose.yml - Full configuration with multiple deployment options
  • docker/compose/docker-compose.simple.yml - Simple single-container deployment (recommended for quick start)
  • docker/compose/docker-compose.traefik.yml - Traefik-based deployment for production
  • docker/compose/docker-compose.test.yml - Test configuration

Volume Mounts:

  • ./jobs: Persist task data
  • ./Data: Reference data

For detailed Docker deployment information, see DOCKER_DEPLOYMENT.md

Troubleshooting

pixi not found

# Ensure pixi is in PATH
export PATH="$HOME/.pixi/bin:$PATH"

# Or reinstall
curl -fsSL https://pixi.sh/install.sh | bash

Environment not found

# Reinstall environments
pixi install

BtToxin_Digger not available

# Verify digger environment
pixi run -e digger BtToxin_Digger --help

Permission errors

Ensure write permissions on output directories. The pipeline creates directories automatically.

License

MIT License

Description
Automated Bacillus thuringiensis toxin mining system with CI/CD integration.
Readme 21 MiB
Languages
Python 44%
Perl 33.7%
Vue 13.2%
TypeScript 4.4%
Shell 2.6%
Other 2%