- Docker: - Explicitly install pixi environments (digger, pipeline, webbackend) during build to prevent runtime network/DNS failures. - Optimize pnpm config (copy method) to fix EAGAIN errors. - Backend: - Refactor ZIP bundling: use flat semantic directories (1_Toxin_Mining, etc.). - Fix "nested zip" issue by cleaning existing archives before bundling. - Exclude raw 'context' directory from final download. - Frontend: - Update TutorialView documentation to match new result structure. - Improve TaskMonitor progress bar precision (1 decimal place). - Update i18n (en/zh) for new file descriptions. Co-Authored-By: Claude <noreply@anthropic.com>
16 KiB
16 KiB
BtToxin Pipeline Agent Guide
Overview
BtToxin Pipeline is an automated Bacillus thuringiensis toxin mining system. It identifies Cry toxin genes in bacterial genomes and predicts their target insects using a three-stage pipeline:
- Digger: BtToxin_Digger toxin mining
- Shotter: Toxin scoring and target prediction
- Plot: Heatmap generation and report creation
Tech Stack
| Layer | Technology |
|---|---|
| Package Manager | pixi (conda environments) |
| Pipeline | Python 3.9+ (pandas, matplotlib, seaborn) |
| Digger Tool | BtToxin_Digger (Perl, BLAST, HMMER) |
| Frontend | Vue 3 + Vite + Element Plus + vue-i18n |
| Backend | FastAPI + Uvicorn + SQLAlchemy |
| Database | PostgreSQL 15 (Metadata) + Redis 7 (Queue) |
| Result Storage | File system + 30-day retention |
Quick Start
# 1. Clone and install dependencies
git clone <repo>
cd bttoxin-pipeline
pixi install
# 2. Start services (Production with Traefik)
# Using the unified production configuration
docker compose -f docker/compose/docker-compose.traefik.yml -p compose up -d --build
# Access:
# Frontend: https://bttiaw.hzau.edu.cn (via Traefik)
# Backend API: http://bttoxin-pipeline:8000 (Internal)
# 3. Development Mode (Local)
pixi run web-start
- 上传 .fna 基因组文件
- 配置参数
- 点击提交
- 自动跳转到 /{task_id} 页面
- 页面每 3 秒自动刷新查看进度
- 完成后点击下载结果压缩包
## 任务提交流程
- 用户在首页上传 .fna 文件
- 点击"提交任务"按钮
- 后端创建任务,返回 task_id
- 前端自动跳转到 /{task_id} 页面
- 页面每 3 秒轮询后端获取最新状态
- 显示:进度条、当前阶段、预计剩余时间
- 完成后显示"下载分析结果"按钮
- 结果保留 30 天后自动删除
## Web API 接口
### 创建任务
```bash
POST /api/tasks
Content-Type: multipart/form-data
参数:
- file: .fna 文件
- min_identity: 最小相似度 (0-1, 默认: 0.8)
- min_coverage: 最小覆盖度 (0-1, 默认: 0.6)
- allow_unknown_families: 是否允许未知家族 (默认: false)
- require_index_hit: 是否需要索引命中 (默认: true)
- lang: 语言 "zh" | "en" (默认: "zh")
响应:
{
"task_id": "uuid",
"token": "访问令牌",
"status": "pending", // pending | running | completed | failed
"created_at": "创建时间",
"expires_at": "过期时间",
"estimated_duration_seconds": 预估耗时(秒)
}
查询任务状态
GET /api/tasks/{task_id}
响应:
{
"task_id": "uuid",
"status": "running",
"progress": 45, // 进度百分比
"current_stage": "shoter", // 当前阶段: digger | shoter | plots | bundle
"submission_time": "提交时间",
"start_time": "开始时间",
"filename": "原始文件名",
"error": null, // 失败时的错误信息
"estimated_remaining_seconds": 60 // 预估剩余时间
}
下载结果
GET /api/tasks/{task_id}/download
响应: .tar.gz 压缩包
返回文件包含:
- results/digger/ # Digger 分析结果
- results/shotter/ # Shoter 评分结果
- results/logs/ # 执行日志
- input.fna # 原始输入文件
删除任务
DELETE /api/tasks/{task_id}
Project Structure
bttoxin-pipeline/
├── pixi.toml # Pixi environment configuration
├── pyproject.toml # Python package configuration
├── scripts/
│ ├── run_single_fna_pipeline.py # Main orchestrator
│ ├── run_digger_stage.py # Digger-only stage
│ ├── bttoxin_shoter.py # Toxin scoring module
│ ├── plot_shotter.py # Visualization & reporting
│ ├── start_web.sh # Start both frontend + backend
│ └── pixi_runner.py # PixiRunner abstraction
├── bttoxin/ # Python CLI package
│ ├── api.py # Python API
│ ├── cli.py # CLI entry point
│ └── __init__.py
├── web/backend/ # FastAPI backend
│ ├── main.py # FastAPI app entry + API endpoints
│ ├── config.py # Configuration
│ ├── models.py # Data models
│ ├── storage.py # Redis + file storage
│ ├── tasks.py # Task execution logic
│ └── AGENTS.md # Backend-specific guide
├── frontend/ # Vue 3 frontend
│ ├── src/
│ │ ├── api/task.ts # Task API client
│ │ ├── views/
│ │ │ ├── TaskSubmitView.vue # Task submission page
│ │ │ └── TaskMonitorView.vue # Task status page (polling)
│ │ ├── types/task.ts # Task types
│ │ └── ...
│ └── AGENTS.md # Frontend-specific guide
├── Data/ # Reference data
│ └── toxicity-data.csv # BPPRC specificity data
├── external_dbs/ # Optional external database
│ └── bt_toxin/ # Updated BtToxin database
├── tools/ # Utility tools and environments
│ └── reproduction/ # Reproduction environments
│ └── bttoxin_digger/ # BtToxin_Digger reproduction env
├── tests/ # Test suite
│ ├── test_pixi_runner.py # Property-based tests
│ └── test_data/ # Test input files (.fna)
└── docs/ # Documentation
Web API Endpoints
Create Task
POST /api/tasks
Content-Type: multipart/form-data
Parameters:
- file: .fna file
- min_identity: float (0-1, default: 0.8)
- min_coverage: float (0-1, default: 0.6)
- allow_unknown_families: boolean (default: false)
- require_index_hit: boolean (default: true)
- lang: "zh" | "en" (default: "zh") - *Now supported via Accept-Language header*
Response:
{
"task_id": "uuid",
"token": "access_token",
"status": "pending",
"created_at": "2024-01-01T00:00:00",
"expires_at": "2024-01-31T00:00:00",
"estimated_duration_seconds": 120
}
Get Task Status
GET /api/tasks/{task_id}
Response:
{
"task_id": "uuid",
"status": "running",
"progress": 45,
"current_stage": "shoter",
"submission_time": "2024-01-01T00:00:00",
"start_time": "2024-01-01T00:00:10",
"filename": "sample.fna",
"error": null,
"estimated_remaining_seconds": 60
}
Download Result
GET /api/tasks/{task_id}/download
Response: .tar.gz file
Delete Task
DELETE /api/tasks/{task_id}
Development Commands
Via pixi (Recommended)
# Full pipeline (uses pipeline environment)
pixi run -e pipeline pipeline --fna <file.fna>
# Individual stages
pixi run -e pipeline digger-only --fna <file.fna>
pixi run -e pipeline shotter --all_toxins <path>
pixi run -e pipeline plot --strain_scores <path>
# Frontend
pixi run fe-install
pixi run fe-dev # http://localhost:5173
pixi run fe-build
# Backend
pixi run api-dev # http://localhost:8000
pixi run api-test
# Combined (both frontend + backend)
pixi run web-start
# Tests
pixi run -e pipeline python -m pytest tests/test_pixi_runner.py -v
Direct Commands
# Frontend (in frontend/ directory)
pnpm install
pnpm dev --host
# Backend (in project root)
uvicorn web.backend.main:app --reload --host 0.0.0.0 --port 8000
# Pipeline (requires pipeline environment activation)
source ~/.pixi/bin/pixi shell-hook -e pipeline > /tmp/activate.sh
source /tmp/activate.sh
python scripts/run_single_fna_pipeline.py --fna <file>
Docker Deployment
# Build and run with docker-compose
docker-compose -f docker-compose.simple.yml up -d
# Access at http://localhost:80
# API health check: http://localhost:80/api/health
Docker Architecture
bttoxin-pipeline (Stack)
├── traefik (reverse proxy, port 80/443)
├── bttoxin-pipeline (FastAPI + Static Files, port 8000)
├── bttoxin-postgres (Database, port 5432)
└── bttoxin-redis (Task Queue, port 6379)
Docker Volume Mounts
| Host Path | Container Path | Purpose |
|---|---|---|
./jobs |
/app/jobs |
Task results |
postgres_data |
/var/lib/postgresql/data |
Database persistence |
| ... | ... | Source code mounts (dev) |
Task Flow
1. User uploads .fna file via web UI
2. Backend creates task directory: /data/jobs/{task_id}/ (or ./jobs/ in dev)
3. Backend saves input file and parameters
4. Backend starts `pixi run -e pipeline pipeline` in background (asyncio subprocess)
5. Frontend polls GET /api/tasks/{task_id} every 3 seconds
6. On completion, download URL is provided
7. Results available for 30 days, then auto-cleanup
Result Storage
./jobs/{task_id}/ # Or /app/jobs/ in Docker
├── input.fna # Uploaded file
├── params.json # Task parameters
├── task_meta.json # Task metadata (status, progress, etc.)
├── output/ # Pipeline output
│ ├── digger/ # BtToxin_Digger results
│ │ ├── Results/Toxins/
│ │ │ └── All_Toxins.txt # Toxin hits (input to shotter)
│ │ └── ...
│ ├── shotter/ # Shoter scoring results
│ │ ├── toxin_support.tsv
│ │ ├── strain_target_scores.tsv
│ │ └── strain_scores.json
│ └── logs/
└── pipeline_results.tar.gz # Downloadable bundle
Common Tasks
Adding a New Pipeline Stage
- Create script in
scripts/ - Add to
run_single_fna_pipeline.pyorchestration - Register task in
pixi.tomlif standalone execution needed - Add stage definition to
frontend/src/types/task.ts
Modifying Task Parameters
- Update
TaskFormDatainterface infrontend/src/components/task/TaskSubmitForm.vue - Update API endpoint in
web/backend/main.py - Update task execution in
web/backend/tasks.py
Configuring Storage Location
# Set custom jobs directory
export JOBS_DIR=/path/to/jobs
# Or modify pixi.toml [feature.webbackend.env] section
Configuration
Environment Variables
| Variable | Description | Default |
|---|---|---|
VITE_API_BASE_URL |
Frontend API URL (production) | "" (uses relative path) |
JOBS_DIR |
Task storage directory | ./jobs |
DEBUG |
Enable debug mode | false |
Key Files Modified (Recent Fixes)
| File | Change |
|---|---|
scripts/bttoxin_shoter.py |
Added engine="python" for pandas 2.x compatibility; Added empty DataFrame handling |
scripts/run_single_fna_pipeline.py |
Fixed pixi_runner import with sys.path.insert() |
web/backend/tasks.py |
Changed to pixi run -e pipeline pipeline command |
entrypoint.sh |
Fixed nginx proxy_pass to preserve /api/ prefix |
docker-compose.simple.yml |
Docker deployment configuration |
Constraints
Defined in web/backend/config.py:
- Max upload size: 50 MB
- Result retention: 30 days
- Task timeout: 6 hours
- Allowed extensions: .fna, .fa, .fasta
Testing
Python Tests
# Property-based tests for pipeline
pixi run -e pipeline python -m pytest tests/test_pixi_runner.py -v
# Backend tests
pixi run api-test
Frontend Tests
pixi run fe-test
# or
cd frontend && pnpm test:unit
Database Update
mkdir -p external_dbs
git clone --filter=blob:none --no-checkout \
https://github.com/liaochenlanruo/BtToxin_Digger.git tmp_bttoxin_repo
cd tmp_bttoxin_repo
git sparse-checkout init --cone
git sparse-checkout set BTTCMP_db/bt_toxin
git checkout master
cd ..
cp -a tmp_bttoxin_repo/BTTCMP_db/bt_toxin external_dbs/bt_toxin
rm -rf tmp_bttoxin_repo
Subproject Guides
- Frontend: See
frontend/AGENTS.md - Backend: See
web/backend/AGENTS.md - API Documentation: http://localhost:8000/api/docs (when DEBUG=true)
Troubleshooting
Common Issues
| Issue | Solution |
|---|---|
| pixi not found | export PATH="$HOME/.pixi/bin:$PATH" |
| Environment not found | pixi install |
| BtToxin_Digger unavailable | pixi run -e digger BtToxin_Digger --help |
| Permission denied | Ensure write access to /data/jobs |
| Task not found | Check task_id in URL and response |
| Results expired | Results auto-delete after 30 days |
| Nginx 404 on API | Check proxy_pass http://127.0.0.1:8000/api/ (note trailing /api/) |
| KeyError: 'Strain' | Empty DataFrame after filters - shotter now handles this gracefully |
| Pandas engine error | Use engine="python" in pd.read_csv() for pandas 2.x |
Debugging Pipeline Issues
# Check if task was created
curl http://localhost:80/api/tasks/{task_id}
# View task logs
cat jobs/{task_id}/output/logs/digger_execution.log
# Check All_Toxins.txt format
head -1 jobs/{task_id}/output/digger/Results/Toxins/All_Toxins.txt
# Test shotter independently
pixi run -e pipeline python scripts/bttoxin_shoter.py \
--all_toxins jobs/{task_id}/output/digger/Results/Toxins/All_Toxins.txt \
--output_dir /tmp/test_output
Docker-Specific Issues
# Check container health
docker ps
docker logs bttoxin-pipeline
# Check nginx config
docker exec bttoxin-pipeline nginx -T
# Verify backend is running
docker exec bttoxin-pipeline curl http://127.0.0.1:8000/api/health
Post-Mortem: Startup Failures & 404/403 Errors (2026-01 Update)
Symptoms:
- Website returns 404 Not Found or 403 Forbidden.
- Container stuck in
Restartingloop. - Logs show
exec: "uvicorn": executable file not found.
Root Causes & Solutions:
-
Missing Environment Config:
- Cause:
pixi.tomlandpixi.lockwere missing in the final Docker image phase. - Fix: Ensure
COPY --from=builder /app/pixi.toml ...is present in Dockerfile.
- Cause:
-
Port Conflict:
- Cause:
docker-compose.ymlmapped80:80while Traefik already occupied port 80. - Fix: Remove
portsmapping in compose file; rely on Docker internal network (frontend) and Traefik labels.
- Cause:
-
Frontend Permissions:
- Cause: Built frontend files owned by root were not readable by Nginx user.
- Fix: Add
RUN chmod -R 755 /var/www/htmlin Dockerfile.
-
Health Check Path:
- Cause: Nginx routed
/healthto/api/healthbut backend expected/health. - Fix: Update Nginx config to proxy pass to correct endpoint.
- Cause: Nginx routed
Post-Mortem: Consistency Refactoring & Fixes (2026-01-20 Update)
Summary: Major refactoring to ensure consistency between script execution and web pipeline, fix severe container startup failures, and simplify user experience.
1. Unified Pipeline Execution
- Problem: Web backend manually orchestrated pipeline steps, leading to discrepancies with the standalone script (e.g., missing plots, different file formats).
- Fix: Refactored
backend/app/workers/tasks.pyto directly subprocessscripts/run_single_fna_pipeline.py. - Result: Web output is now guaranteed identical to manual script execution.
2. Result Format & Cleanup
- Change: Switched output format from
.tar.gzto.zip. - Feature: Added automatic cleanup of intermediate directories (
digger/,shoter/) to save disk space; only the final ZIP and logs are retained. - Frontend: Updated download logic to handle
.zipfiles.
3. Frontend Simplification
- Change: Removed CRISPR Fusion UI elements (beta feature) to reduce complexity.
- Change: Replaced complex multi-stage status indicators with a "Simulated Progress Bar" for better UX during black-box script execution.
- Fix: Restored "One-click load" button and fixed TypeScript build errors caused by removed variables.
4. Critical Docker Fixes
- Fix (Restart Loop): Removed incorrect
image: postgresdirective indocker-compose.ymlthat caused the web service to run database software instead of the app. - Fix (Env Path): Updated
.dockerignoreto exclude host.pixidirectory, preventing "bad interpreter" errors caused by hardcoded host paths in the container. - Fix (404 Error): Removed erroneous
rm -rf /app/frontendin Dockerfile that was accidentally deleting built frontend assets. - Optimization: Configured
npmmirrorregistry to resolve build timeouts in CN network environments.