# BtToxin Pipeline Automated Bacillus thuringiensis toxin mining system using pixi-managed environments. ## Quick Start ### Prerequisites - [pixi](https://pixi.sh) - Modern package manager for conda environments - Linux x86_64 (linux-64 platform) ### Installation 1. Install pixi (if not already installed): ```bash # Linux/macOS curl -fsSL https://pixi.sh/install.sh | bash # Or via Homebrew brew install pixi ``` 2. Clone and setup the project: ```bash git clone cd bttoxin-pipeline # Install all environments (digger + pipeline) pixi install ``` This creates two isolated environments: - `digger`: BtToxin_Digger with bioconda dependencies (perl, blast, etc.) - `pipeline`: Python analysis tools (pandas, matplotlib, seaborn) ### Running the Pipeline #### Full Pipeline (Recommended) Run the complete analysis pipeline with a single command: ```bash pixi run pipeline --fna tests/test_data/HAN055.fna ``` This executes three stages: 1. **Digger**: BtToxin_Digger toxin mining 2. **Shotter**: Toxin scoring and target prediction 3. **Plot**: Heatmap generation and report creation #### CLI Options ```bash pixi run pipeline --fna [options] Options: --fna PATH Input .fna file (required) --out_root PATH Output directory (default: runs/_run) --toxicity_csv PATH Toxicity data CSV (default: Data/toxicity-data.csv) --min_identity FLOAT Minimum identity threshold 0-1 (default: 0.0) --min_coverage FLOAT Minimum coverage threshold 0-1 (default: 0.0) --disallow_unknown_families Exclude unknown toxin families --require_index_hit Keep only hits with known specificity --lang {zh,en} Report language (default: zh) --bttoxin_db_dir PATH Custom bt_toxin database directory --threads INT Number of threads (default: 4) ``` #### Examples ```bash # Basic run with default settings pixi run pipeline --fna tests/test_data/C15.fna # Strict filtering for high-confidence results pixi run pipeline --fna tests/test_data/HAN055.fna \ --min_identity 0.50 --min_coverage 0.60 \ --disallow_unknown_families --require_index_hit # English report with custom output directory pixi run pipeline --fna tests/test_data/HAN055.fna \ --out_root runs/HAN055_strict --lang en # Use custom database pixi run pipeline --fna tests/test_data/HAN055.fna \ --bttoxin_db_dir /path/to/custom/bt_toxin ``` ### Individual Stage Commands Run stages separately when needed: #### Digger Only ```bash pixi run digger-only --fna [options] Options: --fna PATH Input .fna file (required) --out_dir PATH Output directory (default: runs/_digger_only) --bttoxin_db_dir PATH Custom database directory --threads INT Number of threads (default: 4) --sequence_type Sequence type: nucl/orfs/prot/reads (default: nucl) ``` Example: ```bash pixi run digger-only --fna tests/test_data/C15.fna --threads 8 ``` #### Shotter (Scoring) ```bash pixi run shotter [options] Options: --toxicity_csv PATH Toxicity data CSV --all_toxins PATH All_Toxins.txt from Digger --output_dir PATH Output directory --min_identity FLOAT Minimum identity threshold --min_coverage FLOAT Minimum coverage threshold --allow_unknown_families / --disallow_unknown_families --require_index_hit Keep only indexed hits ``` Example: ```bash pixi run shotter \ --all_toxins runs/C15_run/digger/Results/Toxins/All_Toxins.txt \ --output_dir runs/C15_run/shotter ``` #### Plot (Visualization) ```bash pixi run plot [options] Options: --strain_scores PATH strain_target_scores.tsv from Shotter --toxin_support PATH toxin_support.tsv (optional) --species_scores PATH strain_target_species_scores.tsv (optional) --out_dir PATH Output directory --cmap STRING Colormap (default: viridis) --per_hit_strain NAME Generate per-hit heatmap for specific strain --merge_unresolved Merge other/unknown into unresolved --report_mode {summary,paper} Report style (default: paper) --lang {zh,en} Report language (default: zh) ``` Example: ```bash pixi run plot \ --strain_scores runs/C15_run/shotter/strain_target_scores.tsv \ --toxin_support runs/C15_run/shotter/toxin_support.tsv \ --out_dir runs/C15_run/shotter \ --per_hit_strain C15 --lang en ``` ## Output Structure After running the pipeline: ``` runs/_run/ ├── stage/ # Staged input file │ └── .fna ├── digger/ # BtToxin_Digger outputs │ ├── Results/ │ │ └── Toxins/ │ │ ├── All_Toxins.txt │ │ ├── .list │ │ ├── .gbk │ │ └── Bt_all_genes.table │ └── BtToxin_Digger.log ├── shotter/ # Shotter outputs │ ├── strain_target_scores.tsv │ ├── strain_scores.json │ ├── toxin_support.tsv │ ├── strain_target_species_scores.tsv │ ├── strain_species_scores.json │ ├── strain_target_scores.png │ ├── strain_target_species_scores.png │ ├── per_hit_.png │ └── shotter_report_paper.md ├── logs/ │ └── digger_execution.log └── pipeline_results.tar.gz # Bundled results ``` ## Database Update BtToxin_Digger's built-in database may be outdated. Use the latest from GitHub: ### Update Steps ```bash mkdir -p external_dbs rm -rf external_dbs/bt_toxin tmp_bttoxin_repo git clone --filter=blob:none --no-checkout https://github.com/liaochenlanruo/BtToxin_Digger.git tmp_bttoxin_repo cd tmp_bttoxin_repo git sparse-checkout init --cone git sparse-checkout set BTTCMP_db/bt_toxin git checkout master cd .. cp -a tmp_bttoxin_repo/BTTCMP_db/bt_toxin external_dbs/bt_toxin rm -rf tmp_bttoxin_repo ``` The pipeline automatically detects `external_dbs/bt_toxin` if present. ### Database Structure ``` external_dbs/bt_toxin/ ├── db/ # BLAST index files (required) │ ├── bt_toxin.phr │ ├── bt_toxin.pin │ ├── bt_toxin.psq │ └── ... └── seq/ # Source sequences (optional, for reference) └── bt_toxin*.fas ``` ## Input File Format `.fna` files are FASTA-format nucleotide sequence files containing bacterial genome sequences: ``` >NZ_CP010088.1 Bacillus thuringiensis strain 97-27 chromosome, complete genome TAATGTAACACCAGTAAATATTTCATTCATATATTCTTTTAACTGTATTTTATATTCTTTCTACTCTACAATTTCTTTTA ACTGCCAATATGCATCTTCTAGCCAAGGGTGTAAAACTTTCAACGTGTCTTTTCTATCCCACAAATATGAAATATATGCA ... ``` ## Result Interpretation ### Key Output Files **All_Toxins.txt** - Complete toxin predictions with: - Strain, Protein ID, coordinates - SVM/BLAST/HMM predictions - Hit ID, alignment length, identity, E-value **strain_target_scores.tsv** - Strain-level target predictions: - TopOrder: Most likely target insect order - TopScore: Confidence score (0-1) - Per-order scores for all target orders **toxin_support.tsv** - Per-hit contribution details: - Individual toxin weights and contributions - Family classification and partner status ### Toxin Rankings - **Rank1**: Highest confidence (identity ≥78%, coverage ≥80%) - **Rank2-3**: Moderate confidence - **Rank4**: Lowest confidence predictions ### Target Orders Common insect orders in predictions: - **Lepidoptera**: Moths and butterflies - **Coleoptera**: Beetles - **Diptera**: Flies and mosquitoes - **Hemiptera**: True bugs - **Nematoda**: Roundworms ## Development ### Python Development Environment For development work outside pixi: ```bash uv venv --managed-python -p 3.12 --seed .venv source .venv/bin/activate uv pip install -e . ``` ### Running Tests ```bash # Run property-based tests pixi run -e pipeline python -m pytest tests/test_pixi_runner.py -v ``` ### Project Structure ``` bttoxin-pipeline/ ├── pixi.toml # Pixi environment configuration ├── pyproject.toml # Python package configuration ├── scripts/ # Core pipeline scripts │ ├── run_single_fna_pipeline.py # Main pipeline orchestrator │ ├── run_digger_stage.py # Digger-only stage │ ├── bttoxin_shoter.py # Toxin scoring module │ ├── plot_shotter.py # Visualization & reporting │ └── pixi_runner.py # PixiRunner class ├── bttoxin/ # Python package (CLI entry point) │ ├── __init__.py │ ├── api.py │ └── cli.py ├── Data/ # Reference data │ └── toxicity-data.csv # BPPRC specificity data ├── external_dbs/ # External databases (optional) │ └── bt_toxin/ # Updated BtToxin database ├── tests/ # Test suite │ ├── test_pixi_runner.py # Property-based tests │ └── test_data/ # Test input files ├── docs/ # Documentation ├── runs/ # Pipeline outputs (gitignored) ├── backend/ # FastAPI backend (optional web service) └── frontend/ # Vue.js frontend (optional web UI) ``` ## Troubleshooting ### pixi not found ```bash # Ensure pixi is in PATH export PATH="$HOME/.pixi/bin:$PATH" # Or reinstall curl -fsSL https://pixi.sh/install.sh | bash ``` ### Environment not found ```bash # Reinstall environments pixi install ``` ### BtToxin_Digger not available ```bash # Verify digger environment pixi run -e digger BtToxin_Digger --help ``` ### Permission errors Ensure write permissions on output directories. The pipeline creates directories automatically. ## License MIT License