- Docker: - Explicitly install pixi environments (digger, pipeline, webbackend) during build to prevent runtime network/DNS failures. - Optimize pnpm config (copy method) to fix EAGAIN errors. - Backend: - Refactor ZIP bundling: use flat semantic directories (1_Toxin_Mining, etc.). - Fix "nested zip" issue by cleaning existing archives before bundling. - Exclude raw 'context' directory from final download. - Frontend: - Update TutorialView documentation to match new result structure. - Improve TaskMonitor progress bar precision (1 decimal place). - Update i18n (en/zh) for new file descriptions. Co-Authored-By: Claude <noreply@anthropic.com>
464 lines
12 KiB
Markdown
464 lines
12 KiB
Markdown
# BtToxin Pipeline
|
|
|
|
Automated Bacillus thuringiensis toxin mining system using pixi-managed environments.
|
|
|
|
## Quick Start
|
|
|
|
### Prerequisites
|
|
|
|
- [pixi](https://pixi.sh) - Modern package manager for conda environments
|
|
- Linux x86_64 (linux-64 platform)
|
|
|
|
### Installation
|
|
|
|
1. Install pixi (if not already installed):
|
|
|
|
```bash
|
|
# Linux/macOS
|
|
curl -fsSL https://pixi.sh/install.sh | bash
|
|
|
|
# Or via Homebrew
|
|
brew install pixi
|
|
```
|
|
|
|
2. Clone and setup the project:
|
|
|
|
```bash
|
|
git clone <your-repo>
|
|
cd bttoxin-pipeline
|
|
|
|
# Install all environments (digger + pipeline)
|
|
pixi install
|
|
```
|
|
|
|
This creates two isolated environments:
|
|
- `digger`: BtToxin_Digger with bioconda dependencies (perl, blast, etc.)
|
|
- `pipeline`: Python analysis tools (pandas, matplotlib, seaborn)
|
|
|
|
### Running the Pipeline
|
|
|
|
#### Full Pipeline (Recommended)
|
|
|
|
Run the complete analysis pipeline with a single command:
|
|
|
|
```bash
|
|
pixi run pipeline --fna tests/test_data/HAN055.fna
|
|
```
|
|
|
|
This executes three stages:
|
|
1. **Digger**: BtToxin_Digger toxin mining
|
|
2. **Shotter**: Toxin scoring and target prediction
|
|
3. **Plot**: Heatmap generation and report creation
|
|
|
|
#### CLI Options
|
|
|
|
```bash
|
|
pixi run pipeline --fna <file> [options]
|
|
|
|
Options:
|
|
--fna PATH Input .fna file (required)
|
|
--out_root PATH Output directory (default: runs/<stem>_run)
|
|
--toxicity_csv PATH Toxicity data CSV (default: Data/toxicity-data.csv)
|
|
--min_identity FLOAT Minimum identity threshold 0-1 (default: 0.0)
|
|
--min_coverage FLOAT Minimum coverage threshold 0-1 (default: 0.0)
|
|
--disallow_unknown_families Exclude unknown toxin families
|
|
--require_index_hit Keep only hits with known specificity
|
|
--lang {zh,en} Report language (default: zh)
|
|
--bttoxin_db_dir PATH Custom bt_toxin database directory
|
|
--threads INT Number of threads (default: 4)
|
|
```
|
|
|
|
#### Examples
|
|
|
|
```bash
|
|
# Basic run with default settings
|
|
pixi run pipeline --fna tests/test_data/C15.fna
|
|
|
|
# Strict filtering for high-confidence results
|
|
pixi run pipeline --fna tests/test_data/HAN055.fna \
|
|
--min_identity 0.50 --min_coverage 0.60 \
|
|
--disallow_unknown_families --require_index_hit
|
|
|
|
# English report with custom output directory
|
|
pixi run pipeline --fna tests/test_data/HAN055.fna \
|
|
--out_root runs/HAN055_strict --lang en
|
|
|
|
# Use custom database
|
|
pixi run pipeline --fna tests/test_data/HAN055.fna \
|
|
--bttoxin_db_dir /path/to/custom/bt_toxin
|
|
```
|
|
|
|
### Individual Stage Commands
|
|
|
|
Run stages separately when needed:
|
|
|
|
#### Digger Only
|
|
|
|
```bash
|
|
pixi run digger-only --fna <file> [options]
|
|
|
|
Options:
|
|
--fna PATH Input .fna file (required)
|
|
--out_dir PATH Output directory (default: runs/<stem>_digger_only)
|
|
--bttoxin_db_dir PATH Custom database directory
|
|
--threads INT Number of threads (default: 4)
|
|
--sequence_type Sequence type: nucl/orfs/prot/reads (default: nucl)
|
|
```
|
|
|
|
Example:
|
|
```bash
|
|
pixi run digger-only --fna tests/test_data/C15.fna --threads 8
|
|
```
|
|
|
|
#### Shotter (Scoring)
|
|
|
|
```bash
|
|
pixi run shotter [options]
|
|
|
|
Options:
|
|
--toxicity_csv PATH Toxicity data CSV
|
|
--all_toxins PATH All_Toxins.txt from Digger
|
|
--output_dir PATH Output directory
|
|
--min_identity FLOAT Minimum identity threshold
|
|
--min_coverage FLOAT Minimum coverage threshold
|
|
--allow_unknown_families / --disallow_unknown_families
|
|
--require_index_hit Keep only indexed hits
|
|
```
|
|
|
|
Example:
|
|
```bash
|
|
pixi run shotter \
|
|
--all_toxins runs/C15_run/digger/Results/Toxins/All_Toxins.txt \
|
|
--output_dir runs/C15_run/shotter
|
|
```
|
|
|
|
#### Plot (Visualization)
|
|
|
|
```bash
|
|
pixi run plot [options]
|
|
|
|
Options:
|
|
--strain_scores PATH strain_target_scores.tsv from Shotter
|
|
--toxin_support PATH toxin_support.tsv (optional)
|
|
--species_scores PATH strain_target_species_scores.tsv (optional)
|
|
--out_dir PATH Output directory
|
|
--cmap STRING Colormap (default: viridis)
|
|
--per_hit_strain NAME Generate per-hit heatmap for specific strain
|
|
--merge_unresolved Merge other/unknown into unresolved
|
|
--report_mode {summary,paper} Report style (default: paper)
|
|
--lang {zh,en} Report language (default: zh)
|
|
```
|
|
|
|
Example:
|
|
```bash
|
|
pixi run plot \
|
|
--strain_scores runs/C15_run/shotter/strain_target_scores.tsv \
|
|
--toxin_support runs/C15_run/shotter/toxin_support.tsv \
|
|
--out_dir runs/C15_run/shotter \
|
|
--per_hit_strain C15 --lang en
|
|
```
|
|
|
|
## Output Structure
|
|
|
|
After running the pipeline:
|
|
|
|
```
|
|
runs/<strain>_run/
|
|
├── stage/ # Staged input file
|
|
│ └── <strain>.fna
|
|
├── digger/ # BtToxin_Digger outputs
|
|
│ ├── Results/
|
|
│ │ └── Toxins/
|
|
│ │ ├── All_Toxins.txt
|
|
│ │ ├── <strain>.list
|
|
│ │ ├── <strain>.gbk
|
|
│ │ └── Bt_all_genes.table
|
|
│ └── BtToxin_Digger.log
|
|
├── shotter/ # Shotter outputs
|
|
│ ├── strain_target_scores.tsv
|
|
│ ├── strain_scores.json
|
|
│ ├── toxin_support.tsv
|
|
│ ├── strain_target_species_scores.tsv
|
|
│ ├── strain_species_scores.json
|
|
│ ├── strain_target_scores.png
|
|
│ ├── strain_target_species_scores.png
|
|
│ ├── per_hit_<strain>.png
|
|
│ └── shotter_report_paper.md
|
|
├── logs/
|
|
│ └── digger_execution.log
|
|
└── pipeline_results.tar.gz # Bundled results
|
|
```
|
|
|
|
## Database Update
|
|
|
|
BtToxin_Digger's built-in database may be outdated. Use the latest from GitHub:
|
|
|
|
### Update Steps
|
|
|
|
```bash
|
|
mkdir -p external_dbs
|
|
rm -rf external_dbs/bt_toxin tmp_bttoxin_repo
|
|
|
|
git clone --filter=blob:none --no-checkout https://github.com/liaochenlanruo/BtToxin_Digger.git tmp_bttoxin_repo
|
|
cd tmp_bttoxin_repo
|
|
|
|
git sparse-checkout init --cone
|
|
git sparse-checkout set BTTCMP_db/bt_toxin
|
|
git checkout master
|
|
|
|
cd ..
|
|
cp -a tmp_bttoxin_repo/BTTCMP_db/bt_toxin external_dbs/bt_toxin
|
|
rm -rf tmp_bttoxin_repo
|
|
```
|
|
|
|
The pipeline automatically detects `external_dbs/bt_toxin` if present.
|
|
|
|
### Database Structure
|
|
|
|
```
|
|
external_dbs/bt_toxin/
|
|
├── db/ # BLAST index files (required)
|
|
│ ├── bt_toxin.phr
|
|
│ ├── bt_toxin.pin
|
|
│ ├── bt_toxin.ps
|
|
│ └── ...
|
|
└── seq/ # Source sequences (optional, for reference)
|
|
└── bt_toxin*.fas
|
|
```
|
|
|
|
## Input File Format
|
|
|
|
`.fna` files are FASTA-format nucleotide sequence files containing bacterial genome sequences:
|
|
|
|
```
|
|
>NZ_CP010088.1 Bacillus thuringiensis strain 97-27 chromosome, complete genome
|
|
TAATGTAACACCAGTAAATATTTCATTCATATATTCTTTTAACTGTATTTTATATTCTTTCTACTCTACAATTTCTTTTA
|
|
ACTGCCAATATGCATCTTCTAGCCAAGGGTGTAAAACTTTCAACGTGTCTTTTCTATCCCACAAATATGAAATATATGCA
|
|
...
|
|
```
|
|
|
|
## Result Interpretation
|
|
|
|
|
|
### Key Output Files
|
|
|
|
**All_Toxins.txt** - Complete toxin predictions with:
|
|
- Strain, Protein ID, coordinates
|
|
- SVM/BLAST/HMM predictions
|
|
- Hit ID, alignment length, identity, E-value
|
|
|
|
**strain_target_scores.tsv** - Strain-level target predictions:
|
|
- TopOrder: Most likely target insect order
|
|
- TopScore: Confidence score (0-1)
|
|
- Per-order scores for all target orders
|
|
|
|
**toxin_support.tsv** - Per-hit contribution details:
|
|
- Individual toxin weights and contributions
|
|
- Family classification and partner status
|
|
|
|
### Toxin Rankings
|
|
|
|
- **Rank1**: Highest confidence (identity ≥78%, coverage ≥80%)
|
|
- **Rank2-3**: Moderate confidence
|
|
- **Rank4**:
|
|
Lowest confidence predictions
|
|
|
|
### Target Orders
|
|
|
|
Common insect orders in predictions:
|
|
- **Lepidoptera**: Moths and butterflies
|
|
- **Coleoptera**: Beetles
|
|
- **Diptera**: Flies and mosquitoes
|
|
- **Hemiptera**: True bugs
|
|
- **Nematoda**: Roundworms
|
|
|
|
## Web Interface (Optional)
|
|
|
|
BtToxin Pipeline provides an optional web interface for easy task submission and monitoring.
|
|
|
|
### Quick Start
|
|
|
|
```bash
|
|
# Start both frontend and backend services (recommended)
|
|
pixi run web-start
|
|
|
|
# Frontend: http://localhost:5173
|
|
# Backend: http://localhost:8000
|
|
```
|
|
|
|
Or start services separately:
|
|
|
|
```bash
|
|
# Terminal 1: Start backend
|
|
pixi run api-dev
|
|
|
|
# Terminal 2: Start frontend
|
|
pixi run fe-dev
|
|
```
|
|
|
|
### Using the Web Interface
|
|
|
|
1. Open http://localhost:5173 in your browser
|
|
2. Upload a .fna genome file
|
|
3. Configure analysis parameters (optional)
|
|
4. Click "Submit Task"
|
|
5. You'll be redirected to `/<task_id>` page
|
|
6. The page polls for status every 3 seconds
|
|
7. When complete, download your results as `.tar.gz`
|
|
|
|
### Task URL
|
|
|
|
After submission, your task URL will be:
|
|
```
|
|
http://localhost:5173/<task_id>
|
|
```
|
|
|
|
Save this URL to check results later. Results are available for **30 days**.
|
|
|
|
### Result Storage
|
|
|
|
Task results are stored in `/data/jobs/{task_id}/`:
|
|
- `input.fna` - Your uploaded file
|
|
- `params.json` - Task parameters
|
|
- `output/` - Pipeline output files
|
|
- `pipeline_results.tar.gz` - Downloadable bundle
|
|
|
|
### Result Retention
|
|
|
|
Results are automatically deleted after **30 days** to free up storage space. Download important results before they expire.
|
|
|
|
### Python Development Environment
|
|
|
|
For development work outside pixi:
|
|
|
|
```bash
|
|
uv venv --managed-python -p 3.12 --seed .venv
|
|
source .venv/bin/activate
|
|
uv pip install -e .
|
|
```
|
|
|
|
### Running Tests
|
|
|
|
```bash
|
|
# Run property-based tests for pipeline
|
|
pixi run -e pipeline python -m pytest tests/test_pixi_runner.py -v
|
|
|
|
# Run frontend tests
|
|
pixi run fe-test
|
|
|
|
# Run backend tests
|
|
pixi run api-test
|
|
```
|
|
|
|
### Project Structure
|
|
|
|
```
|
|
bttoxin-pipeline/
|
|
├── pixi.toml # Pixi environment configuration
|
|
├── pyproject.toml # Python package configuration
|
|
├── scripts/ # Core pipeline scripts
|
|
│ ├── run_single_fna_pipeline.py # Main pipeline orchestrator
|
|
│ ├── run_digger_stage.py # Digger-only stage
|
|
│ ├── bttoxin_shoter.py # Toxin scoring module
|
|
│ ├── plot_shotter.py # Visualization & reporting
|
|
│ └── pixi_runner.py # PixiRunner class
|
|
├── bttoxin/ # Python package (CLI entry point)
|
|
│ ├── __init__.py
|
|
│ ├── api.py
|
|
│ └── cli.py
|
|
├── Data/ # Reference data
|
|
│ └── toxicity-data.csv # BPPRC specificity data
|
|
├── external_dbs/ # External databases (optional)
|
|
│ └── bt_toxin/ # Updated BtToxin database
|
|
├── tools/ # Utility tools and environments
|
|
│ └── reproduction/ # BtToxin_Digger reproduction env
|
|
│ └── bttoxin_digger/
|
|
├── tests/ # Test suite
|
|
│ ├── test_pixi_runner.py # Property-based tests
|
|
│ └── test_data/ # Test input files
|
|
├── docs/ # Documentation
|
|
├── runs/ # Pipeline outputs (gitignored)
|
|
├── backend/ # FastAPI backend (optional web service)
|
|
├── frontend/ # Vue.js frontend (optional web UI)
|
|
└── crispr_cas/ # CRISPR-Cas analysis module
|
|
```
|
|
|
|
## Docker Deployment
|
|
|
|
For production deployment:
|
|
|
|
```bash
|
|
# Build and start the service with Traefik integration
|
|
docker compose -f docker/compose/docker-compose.traefik.yml -p compose up -d --build
|
|
|
|
# Access: https://bttiaw.hzau.edu.cn
|
|
```
|
|
|
|
The setup uses Traefik for SSL termination and routing. The backend API and frontend assets are served by the `bttoxin-pipeline` container.
|
|
|
|
**Available Docker configurations:**
|
|
- `docker/compose/docker-compose.traefik.yml` - Production deployment (Recommended)
|
|
- `docker/compose/docker-compose.simple.yml` - Simple deployment (No Traefik)
|
|
- `docker/compose/docker-compose.test.yml` - Test configuration
|
|
|
|
**Volume Mounts:**
|
|
- `./jobs`: Persist task data
|
|
- `./Data`: Reference data
|
|
|
|
For detailed Docker deployment information, see [DOCKER_DEPLOYMENT.md](DOCKER_DEPLOYMENT.md)
|
|
|
|
### Building the Image Manually
|
|
|
|
To build the image manually, ensure you set the correct build context so that `pixi.toml` can be found.
|
|
|
|
```bash
|
|
# Option 1: From project root (specifying context)
|
|
docker build \
|
|
--network=host \
|
|
-f web/zly/docker/dockerfiles/Dockerfile.traefik \
|
|
-t hotwa/bttoxin-app:latest \
|
|
web/zly
|
|
|
|
# Option 2: Enter directory first
|
|
cd web/zly
|
|
docker build \
|
|
--network=host \
|
|
-f docker/dockerfiles/Dockerfile.traefik \
|
|
-t hotwa/bttoxin-app:latest \
|
|
.
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### pixi not found
|
|
|
|
```bash
|
|
# Ensure pixi is in PATH
|
|
export PATH="$HOME/.pixi/bin:$PATH"
|
|
|
|
# Or reinstall
|
|
curl -fsSL https://pixi.sh/install.sh | bash
|
|
```
|
|
|
|
### Environment not found
|
|
|
|
```bash
|
|
# Reinstall environments
|
|
pixi install
|
|
```
|
|
|
|
### BtToxin_Digger not available
|
|
|
|
```bash
|
|
# Verify digger environment
|
|
pixi run -e digger BtToxin_Digger --help
|
|
```
|
|
|
|
### Permission errors
|
|
|
|
Ensure write permissions on output directories. The pipeline creates directories automatically.
|
|
|
|
## License
|
|
|
|
MIT License
|