- Docker Deployment Fixes: - Switch base images to docker.m.daocloud.io to resolve registry 401 errors - Add Postgres and Redis services to docker-compose.traefik.yml - Fix frontend build: replace missing icons (Globe->Location, Chart->TrendCharts) - Fix frontend build: resolve pnpm CI/TTY issues and frozen lockfile errors - Add missing backend dependencies (sqlalchemy, psycopg2, redis-py, celery, docker-py) in pixi.toml - Ensure database tables are created on startup (lifespan event) - Backend Internationalization (i18n): - Add backend/app/core/i18n.py for locale handling - Update API endpoints (jobs, tasks, uploads, results) to return localized messages - Support 'Accept-Language' header (en/zh) - Documentation: - Update DOCKER_DEPLOYMENT.md with new architecture and troubleshooting - Update AGENTS.md with latest stack details and deployment steps - Update @fix_plan.md status Co-Authored-By: Claude <noreply@anthropic.com>
13 KiB
13 KiB
BtToxin Pipeline Agent Guide
Overview
BtToxin Pipeline is an automated Bacillus thuringiensis toxin mining system. It identifies Cry toxin genes in bacterial genomes and predicts their target insects using a three-stage pipeline:
- Digger: BtToxin_Digger toxin mining
- Shotter: Toxin scoring and target prediction
- Plot: Heatmap generation and report creation
Tech Stack
| Layer | Technology |
|---|---|
| Package Manager | pixi (conda environments) |
| Pipeline | Python 3.9+ (pandas, matplotlib, seaborn) |
| Digger Tool | BtToxin_Digger (Perl, BLAST, HMMER) |
| Frontend | Vue 3 + Vite + Element Plus + vue-i18n |
| Backend | FastAPI + Uvicorn + SQLAlchemy |
| Database | PostgreSQL 15 (Metadata) + Redis 7 (Queue) |
| Result Storage | File system + 30-day retention |
Quick Start
# 1. Clone and install dependencies
git clone <repo>
cd bttoxin-pipeline
pixi install
# 2. Start services (Docker recommended for full stack)
# Using DaoCloud mirrors for faster builds in CN
docker compose -f docker/compose/docker-compose.traefik.yml up -d --build
# Access:
# Frontend: https://bttiaw.hzau.edu.cn (via Traefik)
# Backend API: http://localhost:8000 (Internal)
# Traefik Dashboard: http://localhost:8080
# 3. Development Mode (Local)
pixi run web-start
- 上传 .fna 基因组文件
- 配置参数
- 点击提交
- 自动跳转到 /{task_id} 页面
- 页面每 3 秒自动刷新查看进度
- 完成后点击下载结果压缩包
## 任务提交流程
- 用户在首页上传 .fna 文件
- 点击"提交任务"按钮
- 后端创建任务,返回 task_id
- 前端自动跳转到 /{task_id} 页面
- 页面每 3 秒轮询后端获取最新状态
- 显示:进度条、当前阶段、预计剩余时间
- 完成后显示"下载分析结果"按钮
- 结果保留 30 天后自动删除
## Web API 接口
### 创建任务
```bash
POST /api/tasks
Content-Type: multipart/form-data
参数:
- file: .fna 文件
- min_identity: 最小相似度 (0-1, 默认: 0.8)
- min_coverage: 最小覆盖度 (0-1, 默认: 0.6)
- allow_unknown_families: 是否允许未知家族 (默认: false)
- require_index_hit: 是否需要索引命中 (默认: true)
- lang: 语言 "zh" | "en" (默认: "zh")
响应:
{
"task_id": "uuid",
"token": "访问令牌",
"status": "pending", // pending | running | completed | failed
"created_at": "创建时间",
"expires_at": "过期时间",
"estimated_duration_seconds": 预估耗时(秒)
}
查询任务状态
GET /api/tasks/{task_id}
响应:
{
"task_id": "uuid",
"status": "running",
"progress": 45, // 进度百分比
"current_stage": "shoter", // 当前阶段: digger | shoter | plots | bundle
"submission_time": "提交时间",
"start_time": "开始时间",
"filename": "原始文件名",
"error": null, // 失败时的错误信息
"estimated_remaining_seconds": 60 // 预估剩余时间
}
下载结果
GET /api/tasks/{task_id}/download
响应: .tar.gz 压缩包
返回文件包含:
- results/digger/ # Digger 分析结果
- results/shotter/ # Shoter 评分结果
- results/logs/ # 执行日志
- input.fna # 原始输入文件
删除任务
DELETE /api/tasks/{task_id}
Project Structure
bttoxin-pipeline/
├── pixi.toml # Pixi environment configuration
├── pyproject.toml # Python package configuration
├── scripts/
│ ├── run_single_fna_pipeline.py # Main orchestrator
│ ├── run_digger_stage.py # Digger-only stage
│ ├── bttoxin_shoter.py # Toxin scoring module
│ ├── plot_shotter.py # Visualization & reporting
│ ├── start_web.sh # Start both frontend + backend
│ └── pixi_runner.py # PixiRunner abstraction
├── bttoxin/ # Python CLI package
│ ├── api.py # Python API
│ ├── cli.py # CLI entry point
│ └── __init__.py
├── web/backend/ # FastAPI backend
│ ├── main.py # FastAPI app entry + API endpoints
│ ├── config.py # Configuration
│ ├── models.py # Data models
│ ├── storage.py # Redis + file storage
│ ├── tasks.py # Task execution logic
│ └── AGENTS.md # Backend-specific guide
├── frontend/ # Vue 3 frontend
│ ├── src/
│ │ ├── api/task.ts # Task API client
│ │ ├── views/
│ │ │ ├── TaskSubmitView.vue # Task submission page
│ │ │ └── TaskMonitorView.vue # Task status page (polling)
│ │ ├── types/task.ts # Task types
│ │ └── ...
│ └── AGENTS.md # Frontend-specific guide
├── Data/ # Reference data
│ └── toxicity-data.csv # BPPRC specificity data
├── external_dbs/ # Optional external database
│ └── bt_toxin/ # Updated BtToxin database
├── tests/ # Test suite
│ ├── test_pixi_runner.py # Property-based tests
│ └── test_data/ # Test input files (.fna)
└── docs/ # Documentation
Web API Endpoints
Create Task
POST /api/tasks
Content-Type: multipart/form-data
Parameters:
- file: .fna file
- min_identity: float (0-1, default: 0.8)
- min_coverage: float (0-1, default: 0.6)
- allow_unknown_families: boolean (default: false)
- require_index_hit: boolean (default: true)
- lang: "zh" | "en" (default: "zh") - *Now supported via Accept-Language header*
Response:
{
"task_id": "uuid",
"token": "access_token",
"status": "pending",
"created_at": "2024-01-01T00:00:00",
"expires_at": "2024-01-31T00:00:00",
"estimated_duration_seconds": 120
}
Get Task Status
GET /api/tasks/{task_id}
Response:
{
"task_id": "uuid",
"status": "running",
"progress": 45,
"current_stage": "shoter",
"submission_time": "2024-01-01T00:00:00",
"start_time": "2024-01-01T00:00:10",
"filename": "sample.fna",
"error": null,
"estimated_remaining_seconds": 60
}
Download Result
GET /api/tasks/{task_id}/download
Response: .tar.gz file
Delete Task
DELETE /api/tasks/{task_id}
Development Commands
Via pixi (Recommended)
# Full pipeline (uses pipeline environment)
pixi run -e pipeline pipeline --fna <file.fna>
# Individual stages
pixi run -e pipeline digger-only --fna <file.fna>
pixi run -e pipeline shotter --all_toxins <path>
pixi run -e pipeline plot --strain_scores <path>
# Frontend
pixi run fe-install
pixi run fe-dev # http://localhost:5173
pixi run fe-build
# Backend
pixi run api-dev # http://localhost:8000
pixi run api-test
# Combined (both frontend + backend)
pixi run web-start
# Tests
pixi run -e pipeline python -m pytest tests/test_pixi_runner.py -v
Direct Commands
# Frontend (in frontend/ directory)
pnpm install
pnpm dev --host
# Backend (in project root)
uvicorn web.backend.main:app --reload --host 0.0.0.0 --port 8000
# Pipeline (requires pipeline environment activation)
source ~/.pixi/bin/pixi shell-hook -e pipeline > /tmp/activate.sh
source /tmp/activate.sh
python scripts/run_single_fna_pipeline.py --fna <file>
Docker Deployment
# Build and run with docker-compose
docker-compose -f docker-compose.simple.yml up -d
# Access at http://localhost:80
# API health check: http://localhost:80/api/health
Docker Architecture
bttoxin-pipeline (Stack)
├── traefik (reverse proxy, port 80/443)
├── bttoxin-pipeline (FastAPI + Static Files, port 8000)
├── bttoxin-postgres (Database, port 5432)
└── bttoxin-redis (Task Queue, port 6379)
Docker Volume Mounts
| Host Path | Container Path | Purpose |
|---|---|---|
./jobs |
/app/jobs |
Task results |
postgres_data |
/var/lib/postgresql/data |
Database persistence |
| ... | ... | Source code mounts (dev) |
Task Flow
1. User uploads .fna file via web UI
2. Backend creates task directory: /data/jobs/{task_id}/ (or ./jobs/ in dev)
3. Backend saves input file and parameters
4. Backend starts `pixi run -e pipeline pipeline` in background (asyncio subprocess)
5. Frontend polls GET /api/tasks/{task_id} every 3 seconds
6. On completion, download URL is provided
7. Results available for 30 days, then auto-cleanup
Result Storage
./jobs/{task_id}/ # Or /app/jobs/ in Docker
├── input.fna # Uploaded file
├── params.json # Task parameters
├── task_meta.json # Task metadata (status, progress, etc.)
├── output/ # Pipeline output
│ ├── digger/ # BtToxin_Digger results
│ │ ├── Results/Toxins/
│ │ │ └── All_Toxins.txt # Toxin hits (input to shotter)
│ │ └── ...
│ ├── shotter/ # Shoter scoring results
│ │ ├── toxin_support.tsv
│ │ ├── strain_target_scores.tsv
│ │ └── strain_scores.json
│ └── logs/
└── pipeline_results.tar.gz # Downloadable bundle
Common Tasks
Adding a New Pipeline Stage
- Create script in
scripts/ - Add to
run_single_fna_pipeline.pyorchestration - Register task in
pixi.tomlif standalone execution needed - Add stage definition to
frontend/src/types/task.ts
Modifying Task Parameters
- Update
TaskFormDatainterface infrontend/src/components/task/TaskSubmitForm.vue - Update API endpoint in
web/backend/main.py - Update task execution in
web/backend/tasks.py
Configuring Storage Location
# Set custom jobs directory
export JOBS_DIR=/path/to/jobs
# Or modify pixi.toml [feature.webbackend.env] section
Configuration
Environment Variables
| Variable | Description | Default |
|---|---|---|
VITE_API_BASE_URL |
Frontend API URL (production) | "" (uses relative path) |
JOBS_DIR |
Task storage directory | ./jobs |
DEBUG |
Enable debug mode | false |
Key Files Modified (Recent Fixes)
| File | Change |
|---|---|
scripts/bttoxin_shoter.py |
Added engine="python" for pandas 2.x compatibility; Added empty DataFrame handling |
scripts/run_single_fna_pipeline.py |
Fixed pixi_runner import with sys.path.insert() |
web/backend/tasks.py |
Changed to pixi run -e pipeline pipeline command |
entrypoint.sh |
Fixed nginx proxy_pass to preserve /api/ prefix |
docker-compose.simple.yml |
Docker deployment configuration |
Constraints
Defined in web/backend/config.py:
- Max upload size: 50 MB
- Result retention: 30 days
- Task timeout: 6 hours
- Allowed extensions: .fna, .fa, .fasta
Testing
Python Tests
# Property-based tests for pipeline
pixi run -e pipeline python -m pytest tests/test_pixi_runner.py -v
# Backend tests
pixi run api-test
Frontend Tests
pixi run fe-test
# or
cd frontend && pnpm test:unit
Database Update
mkdir -p external_dbs
git clone --filter=blob:none --no-checkout \
https://github.com/liaochenlanruo/BtToxin_Digger.git tmp_bttoxin_repo
cd tmp_bttoxin_repo
git sparse-checkout init --cone
git sparse-checkout set BTTCMP_db/bt_toxin
git checkout master
cd ..
cp -a tmp_bttoxin_repo/BTTCMP_db/bt_toxin external_dbs/bt_toxin
rm -rf tmp_bttoxin_repo
Subproject Guides
- Frontend: See
frontend/AGENTS.md - Backend: See
web/backend/AGENTS.md - API Documentation: http://localhost:8000/api/docs (when DEBUG=true)
Troubleshooting
Common Issues
| Issue | Solution |
|---|---|
| pixi not found | export PATH="$HOME/.pixi/bin:$PATH" |
| Environment not found | pixi install |
| BtToxin_Digger unavailable | pixi run -e digger BtToxin_Digger --help |
| Permission denied | Ensure write access to /data/jobs |
| Task not found | Check task_id in URL and response |
| Results expired | Results auto-delete after 30 days |
| Nginx 404 on API | Check proxy_pass http://127.0.0.1:8000/api/ (note trailing /api/) |
| KeyError: 'Strain' | Empty DataFrame after filters - shotter now handles this gracefully |
| Pandas engine error | Use engine="python" in pd.read_csv() for pandas 2.x |
Debugging Pipeline Issues
# Check if task was created
curl http://localhost:80/api/tasks/{task_id}
# View task logs
cat jobs/{task_id}/output/logs/digger_execution.log
# Check All_Toxins.txt format
head -1 jobs/{task_id}/output/digger/Results/Toxins/All_Toxins.txt
# Test shotter independently
pixi run -e pipeline python scripts/bttoxin_shoter.py \
--all_toxins jobs/{task_id}/output/digger/Results/Toxins/All_Toxins.txt \
--output_dir /tmp/test_output
Docker-Specific Issues
# Check container health
docker ps
docker logs bttoxin-pipeline
# Check nginx config
docker exec bttoxin-pipeline nginx -T
# Verify backend is running
docker exec bttoxin-pipeline curl http://127.0.0.1:8000/api/health