# BtToxin Pipeline Agent Guide ## Overview BtToxin Pipeline is an automated Bacillus thuringiensis toxin mining system. It identifies Cry toxin genes in bacterial genomes and predicts their target insects using a three-stage pipeline: 1. **Digger**: BtToxin_Digger toxin mining 2. **Shotter**: Toxin scoring and target prediction 3. **Plot**: Heatmap generation and report creation ## Tech Stack | Layer | Technology | |-------|------------| | Package Manager | pixi (conda environments) | | Pipeline | Python 3.9+ (pandas, matplotlib, seaborn) | | Digger Tool | BtToxin_Digger (Perl, BLAST, HMMER) | | Frontend | Vue 3 + Vite + Element Plus | | Backend | FastAPI + Uvicorn | | Result Storage | File system + 30-day retention | ## Quick Start ```bash # 1. 克隆并安装依赖 git clone cd bttoxin-pipeline pixi install # 2. 启动前后端服务(推荐) pixi run web-start # 前端访问: http://localhost:5173 # 后端 API: http://localhost:8000 # 或者分别启动 pixi run fe-dev # 仅前端 pixi run api-dev # 仅后端 # 3. 通过网页提交任务 # - 上传 .fna 基因组文件 # - 配置参数 # - 点击提交 # - 自动跳转到 /{task_id} 页面 # - 页面每 3 秒自动刷新查看进度 # - 完成后点击下载结果压缩包 ``` ## 任务提交流程 ``` 1. 用户在首页上传 .fna 文件 2. 点击"提交任务"按钮 3. 后端创建任务,返回 task_id 4. 前端自动跳转到 /{task_id} 页面 5. 页面每 3 秒轮询后端获取最新状态 6. 显示:进度条、当前阶段、预计剩余时间 7. 完成后显示"下载分析结果"按钮 8. 结果保留 30 天后自动删除 ``` ## Web API 接口 ### 创建任务 ```bash POST /api/tasks Content-Type: multipart/form-data 参数: - file: .fna 文件 - min_identity: 最小相似度 (0-1, 默认: 0.8) - min_coverage: 最小覆盖度 (0-1, 默认: 0.6) - allow_unknown_families: 是否允许未知家族 (默认: false) - require_index_hit: 是否需要索引命中 (默认: true) - lang: 语言 "zh" | "en" (默认: "zh") 响应: { "task_id": "uuid", "token": "访问令牌", "status": "pending", // pending | running | completed | failed "created_at": "创建时间", "expires_at": "过期时间", "estimated_duration_seconds": 预估耗时(秒) } ``` ### 查询任务状态 ```bash GET /api/tasks/{task_id} 响应: { "task_id": "uuid", "status": "running", "progress": 45, // 进度百分比 "current_stage": "shoter", // 当前阶段: digger | shoter | plots | bundle "submission_time": "提交时间", "start_time": "开始时间", "filename": "原始文件名", "error": null, // 失败时的错误信息 "estimated_remaining_seconds": 60 // 预估剩余时间 } ``` ### 下载结果 ```bash GET /api/tasks/{task_id}/download 响应: .tar.gz 压缩包 返回文件包含: - results/digger/ # Digger 分析结果 - results/shotter/ # Shoter 评分结果 - results/logs/ # 执行日志 - input.fna # 原始输入文件 ``` ### 删除任务 ```bash DELETE /api/tasks/{task_id} ``` ## Project Structure ``` bttoxin-pipeline/ ├── pixi.toml # Pixi environment configuration ├── pyproject.toml # Python package configuration ├── scripts/ │ ├── run_single_fna_pipeline.py # Main orchestrator │ ├── run_digger_stage.py # Digger-only stage │ ├── bttoxin_shoter.py # Toxin scoring module │ ├── plot_shotter.py # Visualization & reporting │ ├── start_web.sh # Start both frontend + backend │ └── pixi_runner.py # PixiRunner abstraction ├── bttoxin/ # Python CLI package │ ├── api.py # Python API │ ├── cli.py # CLI entry point │ └── __init__.py ├── web/backend/ # FastAPI backend │ ├── main.py # FastAPI app entry + API endpoints │ ├── config.py # Configuration │ ├── models.py # Data models │ ├── storage.py # Redis + file storage │ ├── tasks.py # Task execution logic │ └── AGENTS.md # Backend-specific guide ├── frontend/ # Vue 3 frontend │ ├── src/ │ │ ├── api/task.ts # Task API client │ │ ├── views/ │ │ │ ├── TaskSubmitView.vue # Task submission page │ │ │ └── TaskMonitorView.vue # Task status page (polling) │ │ ├── types/task.ts # Task types │ │ └── ... │ └── AGENTS.md # Frontend-specific guide ├── Data/ # Reference data │ └── toxicity-data.csv # BPPRC specificity data ├── external_dbs/ # Optional external database │ └── bt_toxin/ # Updated BtToxin database ├── tests/ # Test suite │ ├── test_pixi_runner.py # Property-based tests │ └── test_data/ # Test input files (.fna) └── docs/ # Documentation ``` ## Web API Endpoints ### Create Task ```bash POST /api/tasks Content-Type: multipart/form-data Parameters: - file: .fna file - min_identity: float (0-1, default: 0.8) - min_coverage: float (0-1, default: 0.6) - allow_unknown_families: boolean (default: false) - require_index_hit: boolean (default: true) - lang: "zh" | "en" (default: "zh") Response: { "task_id": "uuid", "token": "access_token", "status": "pending", "created_at": "2024-01-01T00:00:00", "expires_at": "2024-01-31T00:00:00", "estimated_duration_seconds": 120 } ``` ### Get Task Status ```bash GET /api/tasks/{task_id} Response: { "task_id": "uuid", "status": "running", "progress": 45, "current_stage": "shoter", "submission_time": "2024-01-01T00:00:00", "start_time": "2024-01-01T00:00:10", "filename": "sample.fna", "error": null, "estimated_remaining_seconds": 60 } ``` ### Download Result ```bash GET /api/tasks/{task_id}/download Response: .tar.gz file ``` ### Delete Task ```bash DELETE /api/tasks/{task_id} ``` ## Development Commands ### Via pixi (Recommended) ```bash # Full pipeline (uses pipeline environment) pixi run -e pipeline pipeline --fna # Individual stages pixi run -e pipeline digger-only --fna pixi run -e pipeline shotter --all_toxins pixi run -e pipeline plot --strain_scores # Frontend pixi run fe-install pixi run fe-dev # http://localhost:5173 pixi run fe-build # Backend pixi run api-dev # http://localhost:8000 pixi run api-test # Combined (both frontend + backend) pixi run web-start # Tests pixi run -e pipeline python -m pytest tests/test_pixi_runner.py -v ``` ### Direct Commands ```bash # Frontend (in frontend/ directory) pnpm install pnpm dev --host # Backend (in project root) uvicorn web.backend.main:app --reload --host 0.0.0.0 --port 8000 # Pipeline (requires pipeline environment activation) source ~/.pixi/bin/pixi shell-hook -e pipeline > /tmp/activate.sh source /tmp/activate.sh python scripts/run_single_fna_pipeline.py --fna ``` ## Docker Deployment ```bash # Build and run with docker-compose docker-compose -f docker-compose.simple.yml up -d # Access at http://localhost:80 # API health check: http://localhost:80/api/health ``` ### Docker Architecture ``` bttoxin-pipeline (single container) ├── nginx (port 80) # Reverse proxy + static files ├── uvicorn (port 8000) # FastAPI backend └── pixi environments # digger, pipeline, webbackend ``` ### Docker Volume Mounts | Host Path | Container Path | Purpose | |-----------|----------------|---------| | `./jobs` | `/app/jobs` | Task results | | `./frontend/dist` | `/var/www/html` | Frontend static files | | `./web` | `/app/web` | Backend code | | `./Data` | `/app/Data` | Reference data | | `./scripts` | `/app/scripts` | Pipeline scripts | | `./pixi.toml` | `/app/pixi.toml` | Pixi configuration | ## Task Flow ``` 1. User uploads .fna file via web UI 2. Backend creates task directory: /data/jobs/{task_id}/ (or ./jobs/ in dev) 3. Backend saves input file and parameters 4. Backend starts `pixi run -e pipeline pipeline` in background (asyncio subprocess) 5. Frontend polls GET /api/tasks/{task_id} every 3 seconds 6. On completion, download URL is provided 7. Results available for 30 days, then auto-cleanup ``` ## Result Storage ``` ./jobs/{task_id}/ # Or /app/jobs/ in Docker ├── input.fna # Uploaded file ├── params.json # Task parameters ├── task_meta.json # Task metadata (status, progress, etc.) ├── output/ # Pipeline output │ ├── digger/ # BtToxin_Digger results │ │ ├── Results/Toxins/ │ │ │ └── All_Toxins.txt # Toxin hits (input to shotter) │ │ └── ... │ ├── shotter/ # Shoter scoring results │ │ ├── toxin_support.tsv │ │ ├── strain_target_scores.tsv │ │ └── strain_scores.json │ └── logs/ └── pipeline_results.tar.gz # Downloadable bundle ``` ## Common Tasks ### Adding a New Pipeline Stage 1. Create script in `scripts/` 2. Add to `run_single_fna_pipeline.py` orchestration 3. Register task in `pixi.toml` if standalone execution needed 4. Add stage definition to `frontend/src/types/task.ts` ### Modifying Task Parameters 1. Update `TaskFormData` interface in `frontend/src/components/task/TaskSubmitForm.vue` 2. Update API endpoint in `web/backend/main.py` 3. Update task execution in `web/backend/tasks.py` ### Configuring Storage Location ```bash # Set custom jobs directory export JOBS_DIR=/path/to/jobs # Or modify pixi.toml [feature.webbackend.env] section ``` ## Configuration ### Environment Variables | Variable | Description | Default | |----------|-------------|---------| | `VITE_API_BASE_URL` | Frontend API URL (production) | "" (uses relative path) | | `JOBS_DIR` | Task storage directory | ./jobs | | `DEBUG` | Enable debug mode | false | ### Key Files Modified (Recent Fixes) | File | Change | |------|--------| | `scripts/bttoxin_shoter.py` | Added `engine="python"` for pandas 2.x compatibility; Added empty DataFrame handling | | `scripts/run_single_fna_pipeline.py` | Fixed `pixi_runner` import with `sys.path.insert()` | | `web/backend/tasks.py` | Changed to `pixi run -e pipeline pipeline` command | | `entrypoint.sh` | Fixed nginx `proxy_pass` to preserve `/api/` prefix | | `docker-compose.simple.yml` | Docker deployment configuration | ### Constraints Defined in `web/backend/config.py`: - Max upload size: 50 MB - Result retention: 30 days - Task timeout: 6 hours - Allowed extensions: .fna, .fa, .fasta ## Testing ### Python Tests ```bash # Property-based tests for pipeline pixi run -e pipeline python -m pytest tests/test_pixi_runner.py -v # Backend tests pixi run api-test ``` ### Frontend Tests ```bash pixi run fe-test # or cd frontend && pnpm test:unit ``` ## Database Update ```bash mkdir -p external_dbs git clone --filter=blob:none --no-checkout \ https://github.com/liaochenlanruo/BtToxin_Digger.git tmp_bttoxin_repo cd tmp_bttoxin_repo git sparse-checkout init --cone git sparse-checkout set BTTCMP_db/bt_toxin git checkout master cd .. cp -a tmp_bttoxin_repo/BTTCMP_db/bt_toxin external_dbs/bt_toxin rm -rf tmp_bttoxin_repo ``` ## Subproject Guides - **Frontend**: See [`frontend/AGENTS.md`](frontend/AGENTS.md) - **Backend**: See [`web/backend/AGENTS.md`](web/backend/AGENTS.md) - **API Documentation**: http://localhost:8000/api/docs (when DEBUG=true) ## Troubleshooting ### Common Issues | Issue | Solution | |-------|----------| | pixi not found | `export PATH="$HOME/.pixi/bin:$PATH"` | | Environment not found | `pixi install` | | BtToxin_Digger unavailable | `pixi run -e digger BtToxin_Digger --help` | | Permission denied | Ensure write access to `/data/jobs` | | Task not found | Check task_id in URL and response | | Results expired | Results auto-delete after 30 days | | Nginx 404 on API | Check `proxy_pass http://127.0.0.1:8000/api/` (note trailing `/api/`) | | KeyError: 'Strain' | Empty DataFrame after filters - shotter now handles this gracefully | | Pandas engine error | Use `engine="python"` in `pd.read_csv()` for pandas 2.x | ### Debugging Pipeline Issues ```bash # Check if task was created curl http://localhost:80/api/tasks/{task_id} # View task logs cat jobs/{task_id}/output/logs/digger_execution.log # Check All_Toxins.txt format head -1 jobs/{task_id}/output/digger/Results/Toxins/All_Toxins.txt # Test shotter independently pixi run -e pipeline python scripts/bttoxin_shoter.py \ --all_toxins jobs/{task_id}/output/digger/Results/Toxins/All_Toxins.txt \ --output_dir /tmp/test_output ``` ### Docker-Specific Issues ```bash # Check container health docker ps docker logs bttoxin-pipeline # Check nginx config docker exec bttoxin-pipeline nginx -T # Verify backend is running docker exec bttoxin-pipeline curl http://127.0.0.1:8000/api/health ```