Files
bttoxin-pipeline/AGENTS.md
zly fe353fc0bc chore: 初始版本提交 - 简化架构 + 轮询改造
- 移除 Motia Streams 实时通信,改用 3 秒轮询
- 简化前端代码,移除冗余组件
- 简化后端架构,准备 FastAPI 重构
- 更新 pixi.toml 环境配置
- 保留 bttoxin_digger_v5_repro 作为参考文档

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-13 16:50:09 +08:00

475 lines
13 KiB
Markdown

# BtToxin Pipeline Agent Guide
## Overview
BtToxin Pipeline is an automated Bacillus thuringiensis toxin mining system. It identifies Cry toxin genes in bacterial genomes and predicts their target insects using a three-stage pipeline:
1. **Digger**: BtToxin_Digger toxin mining
2. **Shotter**: Toxin scoring and target prediction
3. **Plot**: Heatmap generation and report creation
## Tech Stack
| Layer | Technology |
|-------|------------|
| Package Manager | pixi (conda environments) |
| Pipeline | Python 3.9+ (pandas, matplotlib, seaborn) |
| Digger Tool | BtToxin_Digger (Perl, BLAST, HMMER) |
| Frontend | Vue 3 + Vite + Element Plus |
| Backend | FastAPI + Uvicorn |
| Result Storage | File system + 30-day retention |
## Quick Start
```bash
# 1. 克隆并安装依赖
git clone <repo>
cd bttoxin-pipeline
pixi install
# 2. 启动前后端服务(推荐)
pixi run web-start
# 前端访问: http://localhost:5173
# 后端 API: http://localhost:8000
# 或者分别启动
pixi run fe-dev # 仅前端
pixi run api-dev # 仅后端
# 3. 通过网页提交任务
# - 上传 .fna 基因组文件
# - 配置参数
# - 点击提交
# - 自动跳转到 /{task_id} 页面
# - 页面每 3 秒自动刷新查看进度
# - 完成后点击下载结果压缩包
```
## 任务提交流程
```
1. 用户在首页上传 .fna 文件
2. 点击"提交任务"按钮
3. 后端创建任务,返回 task_id
4. 前端自动跳转到 /{task_id} 页面
5. 页面每 3 秒轮询后端获取最新状态
6. 显示:进度条、当前阶段、预计剩余时间
7. 完成后显示"下载分析结果"按钮
8. 结果保留 30 天后自动删除
```
## Web API 接口
### 创建任务
```bash
POST /api/tasks
Content-Type: multipart/form-data
参数:
- file: .fna 文件
- min_identity: 最小相似度 (0-1, 默认: 0.8)
- min_coverage: 最小覆盖度 (0-1, 默认: 0.6)
- allow_unknown_families: 是否允许未知家族 (默认: false)
- require_index_hit: 是否需要索引命中 (默认: true)
- lang: 语言 "zh" | "en" (默认: "zh")
响应:
{
"task_id": "uuid",
"token": "访问令牌",
"status": "pending", // pending | running | completed | failed
"created_at": "创建时间",
"expires_at": "过期时间",
"estimated_duration_seconds": 预估耗时()
}
```
### 查询任务状态
```bash
GET /api/tasks/{task_id}
响应:
{
"task_id": "uuid",
"status": "running",
"progress": 45, // 进度百分比
"current_stage": "shoter", // 当前阶段: digger | shoter | plots | bundle
"submission_time": "提交时间",
"start_time": "开始时间",
"filename": "原始文件名",
"error": null, // 失败时的错误信息
"estimated_remaining_seconds": 60 // 预估剩余时间
}
```
### 下载结果
```bash
GET /api/tasks/{task_id}/download
响应: .tar.gz 压缩包
返回文件包含:
- results/digger/ # Digger 分析结果
- results/shotter/ # Shoter 评分结果
- results/logs/ # 执行日志
- input.fna # 原始输入文件
```
### 删除任务
```bash
DELETE /api/tasks/{task_id}
```
## Project Structure
```
bttoxin-pipeline/
├── pixi.toml # Pixi environment configuration
├── pyproject.toml # Python package configuration
├── scripts/
│ ├── run_single_fna_pipeline.py # Main orchestrator
│ ├── run_digger_stage.py # Digger-only stage
│ ├── bttoxin_shoter.py # Toxin scoring module
│ ├── plot_shotter.py # Visualization & reporting
│ ├── start_web.sh # Start both frontend + backend
│ └── pixi_runner.py # PixiRunner abstraction
├── bttoxin/ # Python CLI package
│ ├── api.py # Python API
│ ├── cli.py # CLI entry point
│ └── __init__.py
├── web/backend/ # FastAPI backend
│ ├── main.py # FastAPI app entry + API endpoints
│ ├── config.py # Configuration
│ ├── models.py # Data models
│ ├── storage.py # Redis + file storage
│ ├── tasks.py # Task execution logic
│ └── AGENTS.md # Backend-specific guide
├── frontend/ # Vue 3 frontend
│ ├── src/
│ │ ├── api/task.ts # Task API client
│ │ ├── views/
│ │ │ ├── TaskSubmitView.vue # Task submission page
│ │ │ └── TaskMonitorView.vue # Task status page (polling)
│ │ ├── types/task.ts # Task types
│ │ └── ...
│ └── AGENTS.md # Frontend-specific guide
├── Data/ # Reference data
│ └── toxicity-data.csv # BPPRC specificity data
├── external_dbs/ # Optional external database
│ └── bt_toxin/ # Updated BtToxin database
├── tests/ # Test suite
│ ├── test_pixi_runner.py # Property-based tests
│ └── test_data/ # Test input files (.fna)
└── docs/ # Documentation
```
## Web API Endpoints
### Create Task
```bash
POST /api/tasks
Content-Type: multipart/form-data
Parameters:
- file: .fna file
- min_identity: float (0-1, default: 0.8)
- min_coverage: float (0-1, default: 0.6)
- allow_unknown_families: boolean (default: false)
- require_index_hit: boolean (default: true)
- lang: "zh" | "en" (default: "zh")
Response:
{
"task_id": "uuid",
"token": "access_token",
"status": "pending",
"created_at": "2024-01-01T00:00:00",
"expires_at": "2024-01-31T00:00:00",
"estimated_duration_seconds": 120
}
```
### Get Task Status
```bash
GET /api/tasks/{task_id}
Response:
{
"task_id": "uuid",
"status": "running",
"progress": 45,
"current_stage": "shoter",
"submission_time": "2024-01-01T00:00:00",
"start_time": "2024-01-01T00:00:10",
"filename": "sample.fna",
"error": null,
"estimated_remaining_seconds": 60
}
```
### Download Result
```bash
GET /api/tasks/{task_id}/download
Response: .tar.gz file
```
### Delete Task
```bash
DELETE /api/tasks/{task_id}
```
## Development Commands
### Via pixi (Recommended)
```bash
# Full pipeline (uses pipeline environment)
pixi run -e pipeline pipeline --fna <file.fna>
# Individual stages
pixi run -e pipeline digger-only --fna <file.fna>
pixi run -e pipeline shotter --all_toxins <path>
pixi run -e pipeline plot --strain_scores <path>
# Frontend
pixi run fe-install
pixi run fe-dev # http://localhost:5173
pixi run fe-build
# Backend
pixi run api-dev # http://localhost:8000
pixi run api-test
# Combined (both frontend + backend)
pixi run web-start
# Tests
pixi run -e pipeline python -m pytest tests/test_pixi_runner.py -v
```
### Direct Commands
```bash
# Frontend (in frontend/ directory)
pnpm install
pnpm dev --host
# Backend (in project root)
uvicorn web.backend.main:app --reload --host 0.0.0.0 --port 8000
# Pipeline (requires pipeline environment activation)
source ~/.pixi/bin/pixi shell-hook -e pipeline > /tmp/activate.sh
source /tmp/activate.sh
python scripts/run_single_fna_pipeline.py --fna <file>
```
## Docker Deployment
```bash
# Build and run with docker-compose
docker-compose -f docker-compose.simple.yml up -d
# Access at http://localhost:80
# API health check: http://localhost:80/api/health
```
### Docker Architecture
```
bttoxin-pipeline (single container)
├── nginx (port 80) # Reverse proxy + static files
├── uvicorn (port 8000) # FastAPI backend
└── pixi environments # digger, pipeline, webbackend
```
### Docker Volume Mounts
| Host Path | Container Path | Purpose |
|-----------|----------------|---------|
| `./jobs` | `/app/jobs` | Task results |
| `./frontend/dist` | `/var/www/html` | Frontend static files |
| `./web` | `/app/web` | Backend code |
| `./Data` | `/app/Data` | Reference data |
| `./scripts` | `/app/scripts` | Pipeline scripts |
| `./pixi.toml` | `/app/pixi.toml` | Pixi configuration |
## Task Flow
```
1. User uploads .fna file via web UI
2. Backend creates task directory: /data/jobs/{task_id}/ (or ./jobs/ in dev)
3. Backend saves input file and parameters
4. Backend starts `pixi run -e pipeline pipeline` in background (asyncio subprocess)
5. Frontend polls GET /api/tasks/{task_id} every 3 seconds
6. On completion, download URL is provided
7. Results available for 30 days, then auto-cleanup
```
## Result Storage
```
./jobs/{task_id}/ # Or /app/jobs/ in Docker
├── input.fna # Uploaded file
├── params.json # Task parameters
├── task_meta.json # Task metadata (status, progress, etc.)
├── output/ # Pipeline output
│ ├── digger/ # BtToxin_Digger results
│ │ ├── Results/Toxins/
│ │ │ └── All_Toxins.txt # Toxin hits (input to shotter)
│ │ └── ...
│ ├── shotter/ # Shoter scoring results
│ │ ├── toxin_support.tsv
│ │ ├── strain_target_scores.tsv
│ │ └── strain_scores.json
│ └── logs/
└── pipeline_results.tar.gz # Downloadable bundle
```
## Common Tasks
### Adding a New Pipeline Stage
1. Create script in `scripts/`
2. Add to `run_single_fna_pipeline.py` orchestration
3. Register task in `pixi.toml` if standalone execution needed
4. Add stage definition to `frontend/src/types/task.ts`
### Modifying Task Parameters
1. Update `TaskFormData` interface in `frontend/src/components/task/TaskSubmitForm.vue`
2. Update API endpoint in `web/backend/main.py`
3. Update task execution in `web/backend/tasks.py`
### Configuring Storage Location
```bash
# Set custom jobs directory
export JOBS_DIR=/path/to/jobs
# Or modify pixi.toml [feature.webbackend.env] section
```
## Configuration
### Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| `VITE_API_BASE_URL` | Frontend API URL (production) | "" (uses relative path) |
| `JOBS_DIR` | Task storage directory | ./jobs |
| `DEBUG` | Enable debug mode | false |
### Key Files Modified (Recent Fixes)
| File | Change |
|------|--------|
| `scripts/bttoxin_shoter.py` | Added `engine="python"` for pandas 2.x compatibility; Added empty DataFrame handling |
| `scripts/run_single_fna_pipeline.py` | Fixed `pixi_runner` import with `sys.path.insert()` |
| `web/backend/tasks.py` | Changed to `pixi run -e pipeline pipeline` command |
| `entrypoint.sh` | Fixed nginx `proxy_pass` to preserve `/api/` prefix |
| `docker-compose.simple.yml` | Docker deployment configuration |
### Constraints
Defined in `web/backend/config.py`:
- Max upload size: 50 MB
- Result retention: 30 days
- Task timeout: 6 hours
- Allowed extensions: .fna, .fa, .fasta
## Testing
### Python Tests
```bash
# Property-based tests for pipeline
pixi run -e pipeline python -m pytest tests/test_pixi_runner.py -v
# Backend tests
pixi run api-test
```
### Frontend Tests
```bash
pixi run fe-test
# or
cd frontend && pnpm test:unit
```
## Database Update
```bash
mkdir -p external_dbs
git clone --filter=blob:none --no-checkout \
https://github.com/liaochenlanruo/BtToxin_Digger.git tmp_bttoxin_repo
cd tmp_bttoxin_repo
git sparse-checkout init --cone
git sparse-checkout set BTTCMP_db/bt_toxin
git checkout master
cd ..
cp -a tmp_bttoxin_repo/BTTCMP_db/bt_toxin external_dbs/bt_toxin
rm -rf tmp_bttoxin_repo
```
## Subproject Guides
- **Frontend**: See [`frontend/AGENTS.md`](frontend/AGENTS.md)
- **Backend**: See [`web/backend/AGENTS.md`](web/backend/AGENTS.md)
- **API Documentation**: http://localhost:8000/api/docs (when DEBUG=true)
## Troubleshooting
### Common Issues
| Issue | Solution |
|-------|----------|
| pixi not found | `export PATH="$HOME/.pixi/bin:$PATH"` |
| Environment not found | `pixi install` |
| BtToxin_Digger unavailable | `pixi run -e digger BtToxin_Digger --help` |
| Permission denied | Ensure write access to `/data/jobs` |
| Task not found | Check task_id in URL and response |
| Results expired | Results auto-delete after 30 days |
| Nginx 404 on API | Check `proxy_pass http://127.0.0.1:8000/api/` (note trailing `/api/`) |
| KeyError: 'Strain' | Empty DataFrame after filters - shotter now handles this gracefully |
| Pandas engine error | Use `engine="python"` in `pd.read_csv()` for pandas 2.x |
### Debugging Pipeline Issues
```bash
# Check if task was created
curl http://localhost:80/api/tasks/{task_id}
# View task logs
cat jobs/{task_id}/output/logs/digger_execution.log
# Check All_Toxins.txt format
head -1 jobs/{task_id}/output/digger/Results/Toxins/All_Toxins.txt
# Test shotter independently
pixi run -e pipeline python scripts/bttoxin_shoter.py \
--all_toxins jobs/{task_id}/output/digger/Results/Toxins/All_Toxins.txt \
--output_dir /tmp/test_output
```
### Docker-Specific Issues
```bash
# Check container health
docker ps
docker logs bttoxin-pipeline
# Check nginx config
docker exec bttoxin-pipeline nginx -T
# Verify backend is running
docker exec bttoxin-pipeline curl http://127.0.0.1:8000/api/health
```