- Add property-based tests for PixiRunner - Add HAN055.fna test data file - Update README with pixi installation and usage guide - Update .gitignore for pixi and test artifacts - Update CLI to remove Docker-related arguments
9.6 KiB
BtToxin Pipeline
Automated Bacillus thuringiensis toxin mining system using pixi-managed environments.
Quick Start
Prerequisites
- pixi - Modern package manager for conda environments
- Linux x86_64 (linux-64 platform)
Installation
- Install pixi (if not already installed):
# Linux/macOS
curl -fsSL https://pixi.sh/install.sh | bash
# Or via Homebrew
brew install pixi
- Clone and setup the project:
git clone <your-repo>
cd bttoxin-pipeline
# Install all environments (digger + pipeline)
pixi install
This creates two isolated environments:
digger: BtToxin_Digger with bioconda dependencies (perl, blast, etc.)pipeline: Python analysis tools (pandas, matplotlib, seaborn)
Running the Pipeline
Full Pipeline (Recommended)
Run the complete analysis pipeline with a single command:
pixi run pipeline --fna tests/test_data/HAN055.fna
This executes three stages:
- Digger: BtToxin_Digger toxin mining
- Shotter: Toxin scoring and target prediction
- Plot: Heatmap generation and report creation
CLI Options
pixi run pipeline --fna <file> [options]
Options:
--fna PATH Input .fna file (required)
--out_root PATH Output directory (default: runs/<stem>_run)
--toxicity_csv PATH Toxicity data CSV (default: Data/toxicity-data.csv)
--min_identity FLOAT Minimum identity threshold 0-1 (default: 0.0)
--min_coverage FLOAT Minimum coverage threshold 0-1 (default: 0.0)
--disallow_unknown_families Exclude unknown toxin families
--require_index_hit Keep only hits with known specificity
--lang {zh,en} Report language (default: zh)
--bttoxin_db_dir PATH Custom bt_toxin database directory
--threads INT Number of threads (default: 4)
Examples
# Basic run with default settings
pixi run pipeline --fna tests/test_data/C15.fna
# Strict filtering for high-confidence results
pixi run pipeline --fna tests/test_data/HAN055.fna \
--min_identity 0.50 --min_coverage 0.60 \
--disallow_unknown_families --require_index_hit
# English report with custom output directory
pixi run pipeline --fna tests/test_data/HAN055.fna \
--out_root runs/HAN055_strict --lang en
# Use custom database
pixi run pipeline --fna tests/test_data/HAN055.fna \
--bttoxin_db_dir /path/to/custom/bt_toxin
Individual Stage Commands
Run stages separately when needed:
Digger Only
pixi run digger-only --fna <file> [options]
Options:
--fna PATH Input .fna file (required)
--out_dir PATH Output directory (default: runs/<stem>_digger_only)
--bttoxin_db_dir PATH Custom database directory
--threads INT Number of threads (default: 4)
--sequence_type Sequence type: nucl/orfs/prot/reads (default: nucl)
Example:
pixi run digger-only --fna tests/test_data/C15.fna --threads 8
Shotter (Scoring)
pixi run shotter [options]
Options:
--toxicity_csv PATH Toxicity data CSV
--all_toxins PATH All_Toxins.txt from Digger
--output_dir PATH Output directory
--min_identity FLOAT Minimum identity threshold
--min_coverage FLOAT Minimum coverage threshold
--allow_unknown_families / --disallow_unknown_families
--require_index_hit Keep only indexed hits
Example:
pixi run shotter \
--all_toxins runs/C15_run/digger/Results/Toxins/All_Toxins.txt \
--output_dir runs/C15_run/shotter
Plot (Visualization)
pixi run plot [options]
Options:
--strain_scores PATH strain_target_scores.tsv from Shotter
--toxin_support PATH toxin_support.tsv (optional)
--species_scores PATH strain_target_species_scores.tsv (optional)
--out_dir PATH Output directory
--cmap STRING Colormap (default: viridis)
--per_hit_strain NAME Generate per-hit heatmap for specific strain
--merge_unresolved Merge other/unknown into unresolved
--report_mode {summary,paper} Report style (default: paper)
--lang {zh,en} Report language (default: zh)
Example:
pixi run plot \
--strain_scores runs/C15_run/shotter/strain_target_scores.tsv \
--toxin_support runs/C15_run/shotter/toxin_support.tsv \
--out_dir runs/C15_run/shotter \
--per_hit_strain C15 --lang en
Output Structure
After running the pipeline:
runs/<strain>_run/
├── stage/ # Staged input file
│ └── <strain>.fna
├── digger/ # BtToxin_Digger outputs
│ ├── Results/
│ │ └── Toxins/
│ │ ├── All_Toxins.txt
│ │ ├── <strain>.list
│ │ ├── <strain>.gbk
│ │ └── Bt_all_genes.table
│ └── BtToxin_Digger.log
├── shotter/ # Shotter outputs
│ ├── strain_target_scores.tsv
│ ├── strain_scores.json
│ ├── toxin_support.tsv
│ ├── strain_target_species_scores.tsv
│ ├── strain_species_scores.json
│ ├── strain_target_scores.png
│ ├── strain_target_species_scores.png
│ ├── per_hit_<strain>.png
│ └── shotter_report_paper.md
├── logs/
│ └── digger_execution.log
└── pipeline_results.tar.gz # Bundled results
Database Update
BtToxin_Digger's built-in database may be outdated. Use the latest from GitHub:
Update Steps
mkdir -p external_dbs
rm -rf external_dbs/bt_toxin tmp_bttoxin_repo
git clone --filter=blob:none --no-checkout https://github.com/liaochenlanruo/BtToxin_Digger.git tmp_bttoxin_repo
cd tmp_bttoxin_repo
git sparse-checkout init --cone
git sparse-checkout set BTTCMP_db/bt_toxin
git checkout master
cd ..
cp -a tmp_bttoxin_repo/BTTCMP_db/bt_toxin external_dbs/bt_toxin
rm -rf tmp_bttoxin_repo
The pipeline automatically detects external_dbs/bt_toxin if present.
Database Structure
external_dbs/bt_toxin/
├── db/ # BLAST index files (required)
│ ├── bt_toxin.phr
│ ├── bt_toxin.pin
│ ├── bt_toxin.psq
│ └── ...
└── seq/ # Source sequences (optional, for reference)
└── bt_toxin*.fas
Input File Format
.fna files are FASTA-format nucleotide sequence files containing bacterial genome sequences:
>NZ_CP010088.1 Bacillus thuringiensis strain 97-27 chromosome, complete genome
TAATGTAACACCAGTAAATATTTCATTCATATATTCTTTTAACTGTATTTTATATTCTTTCTACTCTACAATTTCTTTTA
ACTGCCAATATGCATCTTCTAGCCAAGGGTGTAAAACTTTCAACGTGTCTTTTCTATCCCACAAATATGAAATATATGCA
...
Result Interpretation
Key Output Files
All_Toxins.txt - Complete toxin predictions with:
- Strain, Protein ID, coordinates
- SVM/BLAST/HMM predictions
- Hit ID, alignment length, identity, E-value
strain_target_scores.tsv - Strain-level target predictions:
- TopOrder: Most likely target insect order
- TopScore: Confidence score (0-1)
- Per-order scores for all target orders
toxin_support.tsv - Per-hit contribution details:
- Individual toxin weights and contributions
- Family classification and partner status
Toxin Rankings
- Rank1: Highest confidence (identity ≥78%, coverage ≥80%)
- Rank2-3: Moderate confidence
- Rank4: Lowest confidence predictions
Target Orders
Common insect orders in predictions:
- Lepidoptera: Moths and butterflies
- Coleoptera: Beetles
- Diptera: Flies and mosquitoes
- Hemiptera: True bugs
- Nematoda: Roundworms
Development
Python Development Environment
For development work outside pixi:
uv venv --managed-python -p 3.12 --seed .venv
source .venv/bin/activate
uv pip install -e .
Running Tests
# Run property-based tests
pixi run -e pipeline python -m pytest tests/test_pixi_runner.py -v
Project Structure
bttoxin-pipeline/
├── pixi.toml # Pixi environment configuration
├── pyproject.toml # Python package configuration
├── scripts/ # Core pipeline scripts
│ ├── run_single_fna_pipeline.py # Main pipeline orchestrator
│ ├── run_digger_stage.py # Digger-only stage
│ ├── bttoxin_shoter.py # Toxin scoring module
│ ├── plot_shotter.py # Visualization & reporting
│ └── pixi_runner.py # PixiRunner class
├── bttoxin/ # Python package (CLI entry point)
│ ├── __init__.py
│ ├── api.py
│ └── cli.py
├── Data/ # Reference data
│ └── toxicity-data.csv # BPPRC specificity data
├── external_dbs/ # External databases (optional)
│ └── bt_toxin/ # Updated BtToxin database
├── tests/ # Test suite
│ ├── test_pixi_runner.py # Property-based tests
│ └── test_data/ # Test input files
├── docs/ # Documentation
├── runs/ # Pipeline outputs (gitignored)
├── backend/ # FastAPI backend (optional web service)
└── frontend/ # Vue.js frontend (optional web UI)
Troubleshooting
pixi not found
# Ensure pixi is in PATH
export PATH="$HOME/.pixi/bin:$PATH"
# Or reinstall
curl -fsSL https://pixi.sh/install.sh | bash
Environment not found
# Reinstall environments
pixi install
BtToxin_Digger not available
# Verify digger environment
pixi run -e digger BtToxin_Digger --help
Permission errors
Ensure write permissions on output directories. The pipeline creates directories automatically.
License
MIT License