# BtToxin Pipeline Agent Guide ## Overview BtToxin Pipeline is an automated Bacillus thuringiensis toxin mining system. It identifies Cry toxin genes in bacterial genomes and predicts their target insects using a three-stage pipeline: 1. **Digger**: BtToxin_Digger toxin mining 2. **Shotter**: Toxin scoring and target prediction 3. **Plot**: Heatmap generation and report creation ## Tech Stack | Layer | Technology | |-------|------------| | Package Manager | pixi (conda environments) | | Pipeline | Python 3.9+ (pandas, matplotlib, seaborn) | | Digger Tool | BtToxin_Digger (Perl, BLAST, HMMER) | | Frontend | Vue 3 + Vite + Element Plus + vue-i18n | | Backend | FastAPI + Uvicorn + SQLAlchemy | | Database | PostgreSQL 15 (Metadata) + Redis 7 (Queue) | | Result Storage | File system + 30-day retention | ## Quick Start ```bash # 1. Clone and install dependencies git clone cd bttoxin-pipeline pixi install # 2. Start services (Production with Traefik) # Using the unified production configuration docker compose -f docker/compose/docker-compose.traefik.yml -p compose up -d --build # Access: # Frontend: https://bttiaw.hzau.edu.cn (via Traefik) # Backend API: http://bttoxin-pipeline:8000 (Internal) # 3. Development Mode (Local) pixi run web-start ``` # - 上传 .fna 基因组文件 # - 配置参数 # - 点击提交 # - 自动跳转到 /{task_id} 页面 # - 页面每 3 秒自动刷新查看进度 # - 完成后点击下载结果压缩包 ``` ## 任务提交流程 ``` 1. 用户在首页上传 .fna 文件 2. 点击"提交任务"按钮 3. 后端创建任务,返回 task_id 4. 前端自动跳转到 /{task_id} 页面 5. 页面每 3 秒轮询后端获取最新状态 6. 显示:进度条、当前阶段、预计剩余时间 7. 完成后显示"下载分析结果"按钮 8. 结果保留 30 天后自动删除 ``` ## Web API 接口 ### 创建任务 ```bash POST /api/tasks Content-Type: multipart/form-data 参数: - file: .fna 文件 - min_identity: 最小相似度 (0-1, 默认: 0.8) - min_coverage: 最小覆盖度 (0-1, 默认: 0.6) - allow_unknown_families: 是否允许未知家族 (默认: false) - require_index_hit: 是否需要索引命中 (默认: true) - lang: 语言 "zh" | "en" (默认: "zh") 响应: { "task_id": "uuid", "token": "访问令牌", "status": "pending", // pending | running | completed | failed "created_at": "创建时间", "expires_at": "过期时间", "estimated_duration_seconds": 预估耗时(秒) } ``` ### 查询任务状态 ```bash GET /api/tasks/{task_id} 响应: { "task_id": "uuid", "status": "running", "progress": 45, // 进度百分比 "current_stage": "shoter", // 当前阶段: digger | shoter | plots | bundle "submission_time": "提交时间", "start_time": "开始时间", "filename": "原始文件名", "error": null, // 失败时的错误信息 "estimated_remaining_seconds": 60 // 预估剩余时间 } ``` ### 下载结果 ```bash GET /api/tasks/{task_id}/download 响应: .tar.gz 压缩包 返回文件包含: - results/digger/ # Digger 分析结果 - results/shotter/ # Shoter 评分结果 - results/logs/ # 执行日志 - input.fna # 原始输入文件 ``` ### 删除任务 ```bash DELETE /api/tasks/{task_id} ``` ## Project Structure ``` bttoxin-pipeline/ ├── pixi.toml # Pixi environment configuration ├── pyproject.toml # Python package configuration ├── scripts/ │ ├── run_single_fna_pipeline.py # Main orchestrator │ ├── run_digger_stage.py # Digger-only stage │ ├── bttoxin_shoter.py # Toxin scoring module │ ├── plot_shotter.py # Visualization & reporting │ ├── start_web.sh # Start both frontend + backend │ └── pixi_runner.py # PixiRunner abstraction ├── bttoxin/ # Python CLI package │ ├── api.py # Python API │ ├── cli.py # CLI entry point │ └── __init__.py ├── web/backend/ # FastAPI backend │ ├── main.py # FastAPI app entry + API endpoints │ ├── config.py # Configuration │ ├── models.py # Data models │ ├── storage.py # Redis + file storage │ ├── tasks.py # Task execution logic │ └── AGENTS.md # Backend-specific guide ├── frontend/ # Vue 3 frontend │ ├── src/ │ │ ├── api/task.ts # Task API client │ │ ├── views/ │ │ │ ├── TaskSubmitView.vue # Task submission page │ │ │ └── TaskMonitorView.vue # Task status page (polling) │ │ ├── types/task.ts # Task types │ │ └── ... │ └── AGENTS.md # Frontend-specific guide ├── Data/ # Reference data │ └── toxicity-data.csv # BPPRC specificity data ├── external_dbs/ # Optional external database │ └── bt_toxin/ # Updated BtToxin database ├── tools/ # Utility tools and environments │ └── reproduction/ # Reproduction environments │ └── bttoxin_digger/ # BtToxin_Digger reproduction env ├── tests/ # Test suite │ ├── test_pixi_runner.py # Property-based tests │ └── test_data/ # Test input files (.fna) └── docs/ # Documentation ``` ## Web API Endpoints ### Create Task ```bash POST /api/tasks Content-Type: multipart/form-data Parameters: - file: .fna file - min_identity: float (0-1, default: 0.8) - min_coverage: float (0-1, default: 0.6) - allow_unknown_families: boolean (default: false) - require_index_hit: boolean (default: true) - lang: "zh" | "en" (default: "zh") - *Now supported via Accept-Language header* Response: { "task_id": "uuid", "token": "access_token", "status": "pending", "created_at": "2024-01-01T00:00:00", "expires_at": "2024-01-31T00:00:00", "estimated_duration_seconds": 120 } ``` ### Get Task Status ```bash GET /api/tasks/{task_id} Response: { "task_id": "uuid", "status": "running", "progress": 45, "current_stage": "shoter", "submission_time": "2024-01-01T00:00:00", "start_time": "2024-01-01T00:00:10", "filename": "sample.fna", "error": null, "estimated_remaining_seconds": 60 } ``` ### Download Result ```bash GET /api/tasks/{task_id}/download Response: .tar.gz file ``` ### Delete Task ```bash DELETE /api/tasks/{task_id} ``` ## Development Commands ### Via pixi (Recommended) ```bash # Full pipeline (uses pipeline environment) pixi run -e pipeline pipeline --fna # Individual stages pixi run -e pipeline digger-only --fna pixi run -e pipeline shotter --all_toxins pixi run -e pipeline plot --strain_scores # Frontend pixi run fe-install pixi run fe-dev # http://localhost:5173 pixi run fe-build # Backend pixi run api-dev # http://localhost:8000 pixi run api-test # Combined (both frontend + backend) pixi run web-start # Tests pixi run -e pipeline python -m pytest tests/test_pixi_runner.py -v ``` ### Direct Commands ```bash # Frontend (in frontend/ directory) pnpm install pnpm dev --host # Backend (in project root) uvicorn web.backend.main:app --reload --host 0.0.0.0 --port 8000 # Pipeline (requires pipeline environment activation) source ~/.pixi/bin/pixi shell-hook -e pipeline > /tmp/activate.sh source /tmp/activate.sh python scripts/run_single_fna_pipeline.py --fna ``` ## Docker Deployment ```bash # Build and run with docker-compose docker-compose -f docker-compose.simple.yml up -d # Access at http://localhost:80 # API health check: http://localhost:80/api/health ``` ### Docker Architecture ``` bttoxin-pipeline (Stack) ├── traefik (reverse proxy, port 80/443) ├── bttoxin-pipeline (FastAPI + Static Files, port 8000) ├── bttoxin-postgres (Database, port 5432) └── bttoxin-redis (Task Queue, port 6379) ``` ### Docker Volume Mounts | Host Path | Container Path | Purpose | |-----------|----------------|---------| | `./jobs` | `/app/jobs` | Task results | | `postgres_data` | `/var/lib/postgresql/data` | Database persistence | | ... | ... | Source code mounts (dev) | ## Task Flow ``` 1. User uploads .fna file via web UI 2. Backend creates task directory: /data/jobs/{task_id}/ (or ./jobs/ in dev) 3. Backend saves input file and parameters 4. Backend starts `pixi run -e pipeline pipeline` in background (asyncio subprocess) 5. Frontend polls GET /api/tasks/{task_id} every 3 seconds 6. On completion, download URL is provided 7. Results available for 30 days, then auto-cleanup ``` ## Result Storage ``` ./jobs/{task_id}/ # Or /app/jobs/ in Docker ├── input.fna # Uploaded file ├── params.json # Task parameters ├── task_meta.json # Task metadata (status, progress, etc.) ├── output/ # Pipeline output │ ├── digger/ # BtToxin_Digger results │ │ ├── Results/Toxins/ │ │ │ └── All_Toxins.txt # Toxin hits (input to shotter) │ │ └── ... │ ├── shotter/ # Shoter scoring results │ │ ├── toxin_support.tsv │ │ ├── strain_target_scores.tsv │ │ └── strain_scores.json │ └── logs/ └── pipeline_results.tar.gz # Downloadable bundle ``` ## Common Tasks ### Adding a New Pipeline Stage 1. Create script in `scripts/` 2. Add to `run_single_fna_pipeline.py` orchestration 3. Register task in `pixi.toml` if standalone execution needed 4. Add stage definition to `frontend/src/types/task.ts` ### Modifying Task Parameters 1. Update `TaskFormData` interface in `frontend/src/components/task/TaskSubmitForm.vue` 2. Update API endpoint in `web/backend/main.py` 3. Update task execution in `web/backend/tasks.py` ### Configuring Storage Location ```bash # Set custom jobs directory export JOBS_DIR=/path/to/jobs # Or modify pixi.toml [feature.webbackend.env] section ``` ## Configuration ### Environment Variables | Variable | Description | Default | |----------|-------------|---------| | `VITE_API_BASE_URL` | Frontend API URL (production) | "" (uses relative path) | | `JOBS_DIR` | Task storage directory | ./jobs | | `DEBUG` | Enable debug mode | false | ### Key Files Modified (Recent Fixes) | File | Change | |------|--------| | `scripts/bttoxin_shoter.py` | Added `engine="python"` for pandas 2.x compatibility; Added empty DataFrame handling | | `scripts/run_single_fna_pipeline.py` | Fixed `pixi_runner` import with `sys.path.insert()` | | `web/backend/tasks.py` | Changed to `pixi run -e pipeline pipeline` command | | `entrypoint.sh` | Fixed nginx `proxy_pass` to preserve `/api/` prefix | | `docker-compose.simple.yml` | Docker deployment configuration | ### Constraints Defined in `web/backend/config.py`: - Max upload size: 50 MB - Result retention: 30 days - Task timeout: 6 hours - Allowed extensions: .fna, .fa, .fasta ## Testing ### Python Tests ```bash # Property-based tests for pipeline pixi run -e pipeline python -m pytest tests/test_pixi_runner.py -v # Backend tests pixi run api-test ``` ### Frontend Tests ```bash pixi run fe-test # or cd frontend && pnpm test:unit ``` ## Database Update ```bash mkdir -p external_dbs git clone --filter=blob:none --no-checkout \ https://github.com/liaochenlanruo/BtToxin_Digger.git tmp_bttoxin_repo cd tmp_bttoxin_repo git sparse-checkout init --cone git sparse-checkout set BTTCMP_db/bt_toxin git checkout master cd .. cp -a tmp_bttoxin_repo/BTTCMP_db/bt_toxin external_dbs/bt_toxin rm -rf tmp_bttoxin_repo ``` ## Subproject Guides - **Frontend**: See [`frontend/AGENTS.md`](frontend/AGENTS.md) - **Backend**: See [`web/backend/AGENTS.md`](web/backend/AGENTS.md) - **API Documentation**: http://localhost:8000/api/docs (when DEBUG=true) ## Troubleshooting ### Common Issues | Issue | Solution | |-------|----------| | pixi not found | `export PATH="$HOME/.pixi/bin:$PATH"` | | Environment not found | `pixi install` | | BtToxin_Digger unavailable | `pixi run -e digger BtToxin_Digger --help` | | Permission denied | Ensure write access to `/data/jobs` | | Task not found | Check task_id in URL and response | | Results expired | Results auto-delete after 30 days | | Nginx 404 on API | Check `proxy_pass http://127.0.0.1:8000/api/` (note trailing `/api/`) | | KeyError: 'Strain' | Empty DataFrame after filters - shotter now handles this gracefully | | Pandas engine error | Use `engine="python"` in `pd.read_csv()` for pandas 2.x | ### Debugging Pipeline Issues ```bash # Check if task was created curl http://localhost:80/api/tasks/{task_id} # View task logs cat jobs/{task_id}/output/logs/digger_execution.log # Check All_Toxins.txt format head -1 jobs/{task_id}/output/digger/Results/Toxins/All_Toxins.txt # Test shotter independently pixi run -e pipeline python scripts/bttoxin_shoter.py \ --all_toxins jobs/{task_id}/output/digger/Results/Toxins/All_Toxins.txt \ --output_dir /tmp/test_output ``` ### Docker-Specific Issues ```bash # Check container health docker ps docker logs bttoxin-pipeline # Check nginx config docker exec bttoxin-pipeline nginx -T # Verify backend is running docker exec bttoxin-pipeline curl http://127.0.0.1:8000/api/health ``` ### Post-Mortem: Startup Failures & 404/403 Errors (2026-01 Update) **Symptoms:** - Website returns 404 Not Found or 403 Forbidden. - Container stuck in `Restarting` loop. - Logs show `exec: "uvicorn": executable file not found`. **Root Causes & Solutions:** 1. **Missing Environment Config**: - **Cause**: `pixi.toml` and `pixi.lock` were missing in the final Docker image phase. - **Fix**: Ensure `COPY --from=builder /app/pixi.toml ...` is present in Dockerfile. 2. **Port Conflict**: - **Cause**: `docker-compose.yml` mapped `80:80` while Traefik already occupied port 80. - **Fix**: Remove `ports` mapping in compose file; rely on Docker internal network (`frontend`) and Traefik labels. 3. **Frontend Permissions**: - **Cause**: Built frontend files owned by root were not readable by Nginx user. - **Fix**: Add `RUN chmod -R 755 /var/www/html` in Dockerfile. 4. **Health Check Path**: - **Cause**: Nginx routed `/health` to `/api/health` but backend expected `/health`. - **Fix**: Update Nginx config to proxy pass to correct endpoint. ### Post-Mortem: Consistency Refactoring & Fixes (2026-01-20 Update) **Summary:** Major refactoring to ensure consistency between script execution and web pipeline, fix severe container startup failures, and simplify user experience. **1. Unified Pipeline Execution** - **Problem**: Web backend manually orchestrated pipeline steps, leading to discrepancies with the standalone script (e.g., missing plots, different file formats). - **Fix**: Refactored `backend/app/workers/tasks.py` to directly subprocess `scripts/run_single_fna_pipeline.py`. - **Result**: Web output is now guaranteed identical to manual script execution. **2. Result Format & Cleanup** - **Change**: Switched output format from `.tar.gz` to `.zip`. - **Feature**: Added automatic cleanup of intermediate directories (`digger/`, `shoter/`) to save disk space; only the final ZIP and logs are retained. - **Frontend**: Updated download logic to handle `.zip` files. **3. Frontend Simplification** - **Change**: Removed CRISPR Fusion UI elements (beta feature) to reduce complexity. - **Change**: Replaced complex multi-stage status indicators with a "Simulated Progress Bar" for better UX during black-box script execution. - **Fix**: Restored "One-click load" button and fixed TypeScript build errors caused by removed variables. **4. Critical Docker Fixes** - **Fix (Restart Loop)**: Removed incorrect `image: postgres` directive in `docker-compose.yml` that caused the web service to run database software instead of the app. - **Fix (Env Path)**: Updated `.dockerignore` to exclude host `.pixi` directory, preventing "bad interpreter" errors caused by hardcoded host paths in the container. - **Fix (404 Error)**: Removed erroneous `rm -rf /app/frontend` in Dockerfile that was accidentally deleting built frontend assets. - **Optimization**: Configured `npmmirror` registry to resolve build timeouts in CN network environments.