# BtToxin Pipeline Automated Bacillus thuringiensis toxin mining system using pixi-managed environments. ## Quick Start ### Prerequisites - [pixi](https://pixi.sh) - Modern package manager for conda environments - Linux x86_64 (linux-64 platform) ### Installation 1. Install pixi (if not already installed): ```bash # Linux/macOS curl -fsSL https://pixi.sh/install.sh | bash # Or via Homebrew brew install pixi ``` 2. Clone and setup the project: ```bash git clone cd bttoxin-pipeline # Install all environments (digger + pipeline) pixi install ``` This creates two isolated environments: - `digger`: BtToxin_Digger with bioconda dependencies (perl, blast, etc.) - `pipeline`: Python analysis tools (pandas, matplotlib, seaborn) ### Running the Pipeline #### Full Pipeline (Recommended) Run the complete analysis pipeline with a single command: ```bash pixi run pipeline --fna tests/test_data/HAN055.fna ``` This executes three stages: 1. **Digger**: BtToxin_Digger toxin mining 2. **Shotter**: Toxin scoring and target prediction 3. **Plot**: Heatmap generation and report creation #### CLI Options ```bash pixi run pipeline --fna [options] Options: --fna PATH Input .fna file (required) --out_root PATH Output directory (default: runs/_run) --toxicity_csv PATH Toxicity data CSV (default: Data/toxicity-data.csv) --min_identity FLOAT Minimum identity threshold 0-1 (default: 0.0) --min_coverage FLOAT Minimum coverage threshold 0-1 (default: 0.0) --disallow_unknown_families Exclude unknown toxin families --require_index_hit Keep only hits with known specificity --lang {zh,en} Report language (default: zh) --bttoxin_db_dir PATH Custom bt_toxin database directory --threads INT Number of threads (default: 4) ``` #### Examples ```bash # Basic run with default settings pixi run pipeline --fna tests/test_data/C15.fna # Strict filtering for high-confidence results pixi run pipeline --fna tests/test_data/HAN055.fna \ --min_identity 0.50 --min_coverage 0.60 \ --disallow_unknown_families --require_index_hit # English report with custom output directory pixi run pipeline --fna tests/test_data/HAN055.fna \ --out_root runs/HAN055_strict --lang en # Use custom database pixi run pipeline --fna tests/test_data/HAN055.fna \ --bttoxin_db_dir /path/to/custom/bt_toxin ``` ### Individual Stage Commands Run stages separately when needed: #### Digger Only ```bash pixi run digger-only --fna [options] Options: --fna PATH Input .fna file (required) --out_dir PATH Output directory (default: runs/_digger_only) --bttoxin_db_dir PATH Custom database directory --threads INT Number of threads (default: 4) --sequence_type Sequence type: nucl/orfs/prot/reads (default: nucl) ``` Example: ```bash pixi run digger-only --fna tests/test_data/C15.fna --threads 8 ``` #### Shotter (Scoring) ```bash pixi run shotter [options] Options: --toxicity_csv PATH Toxicity data CSV --all_toxins PATH All_Toxins.txt from Digger --output_dir PATH Output directory --min_identity FLOAT Minimum identity threshold --min_coverage FLOAT Minimum coverage threshold --allow_unknown_families / --disallow_unknown_families --require_index_hit Keep only indexed hits ``` Example: ```bash pixi run shotter \ --all_toxins runs/C15_run/digger/Results/Toxins/All_Toxins.txt \ --output_dir runs/C15_run/shotter ``` #### Plot (Visualization) ```bash pixi run plot [options] Options: --strain_scores PATH strain_target_scores.tsv from Shotter --toxin_support PATH toxin_support.tsv (optional) --species_scores PATH strain_target_species_scores.tsv (optional) --out_dir PATH Output directory --cmap STRING Colormap (default: viridis) --per_hit_strain NAME Generate per-hit heatmap for specific strain --merge_unresolved Merge other/unknown into unresolved --report_mode {summary,paper} Report style (default: paper) --lang {zh,en} Report language (default: zh) ``` Example: ```bash pixi run plot \ --strain_scores runs/C15_run/shotter/strain_target_scores.tsv \ --toxin_support runs/C15_run/shotter/toxin_support.tsv \ --out_dir runs/C15_run/shotter \ --per_hit_strain C15 --lang en ``` ## Output Structure After running the pipeline: ``` runs/_run/ ├── stage/ # Staged input file │ └── .fna ├── digger/ # BtToxin_Digger outputs │ ├── Results/ │ │ └── Toxins/ │ │ ├── All_Toxins.txt │ │ ├── .list │ │ ├── .gbk │ │ └── Bt_all_genes.table │ └── BtToxin_Digger.log ├── shotter/ # Shotter outputs │ ├── strain_target_scores.tsv │ ├── strain_scores.json │ ├── toxin_support.tsv │ ├── strain_target_species_scores.tsv │ ├── strain_species_scores.json │ ├── strain_target_scores.png │ ├── strain_target_species_scores.png │ ├── per_hit_.png │ └── shotter_report_paper.md ├── logs/ │ └── digger_execution.log └── pipeline_results.tar.gz # Bundled results ``` ## Database Update BtToxin_Digger's built-in database may be outdated. Use the latest from GitHub: ### Update Steps ```bash mkdir -p external_dbs rm -rf external_dbs/bt_toxin tmp_bttoxin_repo git clone --filter=blob:none --no-checkout https://github.com/liaochenlanruo/BtToxin_Digger.git tmp_bttoxin_repo cd tmp_bttoxin_repo git sparse-checkout init --cone git sparse-checkout set BTTCMP_db/bt_toxin git checkout master cd .. cp -a tmp_bttoxin_repo/BTTCMP_db/bt_toxin external_dbs/bt_toxin rm -rf tmp_bttoxin_repo ``` The pipeline automatically detects `external_dbs/bt_toxin` if present. ### Database Structure ``` external_dbs/bt_toxin/ ├── db/ # BLAST index files (required) │ ├── bt_toxin.phr │ ├── bt_toxin.pin │ ├── bt_toxin.ps │ └── ... └── seq/ # Source sequences (optional, for reference) └── bt_toxin*.fas ``` ## Input File Format `.fna` files are FASTA-format nucleotide sequence files containing bacterial genome sequences: ``` >NZ_CP010088.1 Bacillus thuringiensis strain 97-27 chromosome, complete genome TAATGTAACACCAGTAAATATTTCATTCATATATTCTTTTAACTGTATTTTATATTCTTTCTACTCTACAATTTCTTTTA ACTGCCAATATGCATCTTCTAGCCAAGGGTGTAAAACTTTCAACGTGTCTTTTCTATCCCACAAATATGAAATATATGCA ... ``` ## Result Interpretation ### Key Output Files **All_Toxins.txt** - Complete toxin predictions with: - Strain, Protein ID, coordinates - SVM/BLAST/HMM predictions - Hit ID, alignment length, identity, E-value **strain_target_scores.tsv** - Strain-level target predictions: - TopOrder: Most likely target insect order - TopScore: Confidence score (0-1) - Per-order scores for all target orders **toxin_support.tsv** - Per-hit contribution details: - Individual toxin weights and contributions - Family classification and partner status ### Toxin Rankings - **Rank1**: Highest confidence (identity ≥78%, coverage ≥80%) - **Rank2-3**: Moderate confidence - **Rank4**: Lowest confidence predictions ### Target Orders Common insect orders in predictions: - **Lepidoptera**: Moths and butterflies - **Coleoptera**: Beetles - **Diptera**: Flies and mosquitoes - **Hemiptera**: True bugs - **Nematoda**: Roundworms ## Web Interface (Optional) BtToxin Pipeline provides an optional web interface for easy task submission and monitoring. ### Quick Start ```bash # Start both frontend and backend services (recommended) pixi run web-start # Frontend: http://localhost:5173 # Backend: http://localhost:8000 ``` Or start services separately: ```bash # Terminal 1: Start backend pixi run api-dev # Terminal 2: Start frontend pixi run fe-dev ``` ### Using the Web Interface 1. Open http://localhost:5173 in your browser 2. Upload a .fna genome file 3. Configure analysis parameters (optional) 4. Click "Submit Task" 5. You'll be redirected to `/` page 6. The page polls for status every 3 seconds 7. When complete, download your results as `.tar.gz` ### Task URL After submission, your task URL will be: ``` http://localhost:5173/ ``` Save this URL to check results later. Results are available for **30 days**. ### Result Storage Task results are stored in `/data/jobs/{task_id}/`: - `input.fna` - Your uploaded file - `params.json` - Task parameters - `output/` - Pipeline output files - `pipeline_results.tar.gz` - Downloadable bundle ### Result Retention Results are automatically deleted after **30 days** to free up storage space. Download important results before they expire. ### Python Development Environment For development work outside pixi: ```bash uv venv --managed-python -p 3.12 --seed .venv source .venv/bin/activate uv pip install -e . ``` ### Running Tests ```bash # Run property-based tests for pipeline pixi run -e pipeline python -m pytest tests/test_pixi_runner.py -v # Run frontend tests pixi run fe-test # Run backend tests pixi run api-test ``` ### Project Structure ``` bttoxin-pipeline/ ├── pixi.toml # Pixi environment configuration ├── pyproject.toml # Python package configuration ├── scripts/ # Core pipeline scripts │ ├── run_single_fna_pipeline.py # Main pipeline orchestrator │ ├── run_digger_stage.py # Digger-only stage │ ├── bttoxin_shoter.py # Toxin scoring module │ ├── plot_shotter.py # Visualization & reporting │ └── pixi_runner.py # PixiRunner class ├── bttoxin/ # Python package (CLI entry point) │ ├── __init__.py │ ├── api.py │ └── cli.py ├── Data/ # Reference data │ └── toxicity-data.csv # BPPRC specificity data ├── external_dbs/ # External databases (optional) │ └── bt_toxin/ # Updated BtToxin database ├── tools/ # Utility tools and environments │ └── reproduction/ # BtToxin_Digger reproduction env │ └── bttoxin_digger/ ├── tests/ # Test suite │ ├── test_pixi_runner.py # Property-based tests │ └── test_data/ # Test input files ├── docs/ # Documentation ├── runs/ # Pipeline outputs (gitignored) ├── backend/ # FastAPI backend (optional web service) ├── frontend/ # Vue.js frontend (optional web UI) └── crispr_cas/ # CRISPR-Cas analysis module ``` ## Docker Deployment For production deployment: ```bash # Build and start the service with Traefik integration docker compose -f docker/compose/docker-compose.traefik.yml -p compose up -d --build # Access: https://bttiaw.hzau.edu.cn ``` The setup uses Traefik for SSL termination and routing. The backend API and frontend assets are served by the `bttoxin-pipeline` container. **Available Docker configurations:** - `docker/compose/docker-compose.traefik.yml` - Production deployment (Recommended) - `docker/compose/docker-compose.simple.yml` - Simple deployment (No Traefik) - `docker/compose/docker-compose.test.yml` - Test configuration **Volume Mounts:** - `./jobs`: Persist task data - `./Data`: Reference data For detailed Docker deployment information, see [DOCKER_DEPLOYMENT.md](DOCKER_DEPLOYMENT.md) ### Building the Image Manually To build the image manually, ensure you set the correct build context so that `pixi.toml` can be found. ```bash # Option 1: From project root (specifying context) docker build \ --network=host \ -f web/zly/docker/dockerfiles/Dockerfile.traefik \ -t hotwa/bttoxin-app:latest \ web/zly # Option 2: Enter directory first cd web/zly docker build \ --network=host \ -f docker/dockerfiles/Dockerfile.traefik \ -t hotwa/bttoxin-app:latest \ . ``` ## Troubleshooting ### pixi not found ```bash # Ensure pixi is in PATH export PATH="$HOME/.pixi/bin:$PATH" # Or reinstall curl -fsSL https://pixi.sh/install.sh | bash ``` ### Environment not found ```bash # Reinstall environments pixi install ``` ### BtToxin_Digger not available ```bash # Verify digger environment pixi run -e digger BtToxin_Digger --help ``` ### Permission errors Ensure write permissions on output directories. The pipeline creates directories automatically. ## License MIT License