- Docker: - Explicitly install pixi environments (digger, pipeline, webbackend) during build to prevent runtime network/DNS failures. - Optimize pnpm config (copy method) to fix EAGAIN errors. - Backend: - Refactor ZIP bundling: use flat semantic directories (1_Toxin_Mining, etc.). - Fix "nested zip" issue by cleaning existing archives before bundling. - Exclude raw 'context' directory from final download. - Frontend: - Update TutorialView documentation to match new result structure. - Improve TaskMonitor progress bar precision (1 decimal place). - Update i18n (en/zh) for new file descriptions. Co-Authored-By: Claude <noreply@anthropic.com>
528 lines
16 KiB
Markdown
528 lines
16 KiB
Markdown
# BtToxin Pipeline Agent Guide
|
|
|
|
## Overview
|
|
|
|
BtToxin Pipeline is an automated Bacillus thuringiensis toxin mining system. It identifies Cry toxin genes in bacterial genomes and predicts their target insects using a three-stage pipeline:
|
|
1. **Digger**: BtToxin_Digger toxin mining
|
|
2. **Shotter**: Toxin scoring and target prediction
|
|
3. **Plot**: Heatmap generation and report creation
|
|
|
|
## Tech Stack
|
|
|
|
| Layer | Technology |
|
|
|-------|------------|
|
|
| Package Manager | pixi (conda environments) |
|
|
| Pipeline | Python 3.9+ (pandas, matplotlib, seaborn) |
|
|
| Digger Tool | BtToxin_Digger (Perl, BLAST, HMMER) |
|
|
| Frontend | Vue 3 + Vite + Element Plus + vue-i18n |
|
|
| Backend | FastAPI + Uvicorn + SQLAlchemy |
|
|
| Database | PostgreSQL 15 (Metadata) + Redis 7 (Queue) |
|
|
| Result Storage | File system + 30-day retention |
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# 1. Clone and install dependencies
|
|
git clone <repo>
|
|
cd bttoxin-pipeline
|
|
pixi install
|
|
|
|
# 2. Start services (Production with Traefik)
|
|
# Using the unified production configuration
|
|
docker compose -f docker/compose/docker-compose.traefik.yml -p compose up -d --build
|
|
|
|
# Access:
|
|
# Frontend: https://bttiaw.hzau.edu.cn (via Traefik)
|
|
# Backend API: http://bttoxin-pipeline:8000 (Internal)
|
|
|
|
# 3. Development Mode (Local)
|
|
pixi run web-start
|
|
```
|
|
# - 上传 .fna 基因组文件
|
|
# - 配置参数
|
|
# - 点击提交
|
|
# - 自动跳转到 /{task_id} 页面
|
|
# - 页面每 3 秒自动刷新查看进度
|
|
# - 完成后点击下载结果压缩包
|
|
```
|
|
|
|
## 任务提交流程
|
|
|
|
```
|
|
1. 用户在首页上传 .fna 文件
|
|
2. 点击"提交任务"按钮
|
|
3. 后端创建任务,返回 task_id
|
|
4. 前端自动跳转到 /{task_id} 页面
|
|
5. 页面每 3 秒轮询后端获取最新状态
|
|
6. 显示:进度条、当前阶段、预计剩余时间
|
|
7. 完成后显示"下载分析结果"按钮
|
|
8. 结果保留 30 天后自动删除
|
|
```
|
|
|
|
## Web API 接口
|
|
|
|
### 创建任务
|
|
|
|
```bash
|
|
POST /api/tasks
|
|
Content-Type: multipart/form-data
|
|
|
|
参数:
|
|
- file: .fna 文件
|
|
- min_identity: 最小相似度 (0-1, 默认: 0.8)
|
|
- min_coverage: 最小覆盖度 (0-1, 默认: 0.6)
|
|
- allow_unknown_families: 是否允许未知家族 (默认: false)
|
|
- require_index_hit: 是否需要索引命中 (默认: true)
|
|
- lang: 语言 "zh" | "en" (默认: "zh")
|
|
|
|
响应:
|
|
{
|
|
"task_id": "uuid",
|
|
"token": "访问令牌",
|
|
"status": "pending", // pending | running | completed | failed
|
|
"created_at": "创建时间",
|
|
"expires_at": "过期时间",
|
|
"estimated_duration_seconds": 预估耗时(秒)
|
|
}
|
|
```
|
|
|
|
### 查询任务状态
|
|
|
|
```bash
|
|
GET /api/tasks/{task_id}
|
|
|
|
响应:
|
|
{
|
|
"task_id": "uuid",
|
|
"status": "running",
|
|
"progress": 45, // 进度百分比
|
|
"current_stage": "shoter", // 当前阶段: digger | shoter | plots | bundle
|
|
"submission_time": "提交时间",
|
|
"start_time": "开始时间",
|
|
"filename": "原始文件名",
|
|
"error": null, // 失败时的错误信息
|
|
"estimated_remaining_seconds": 60 // 预估剩余时间
|
|
}
|
|
```
|
|
|
|
### 下载结果
|
|
|
|
```bash
|
|
GET /api/tasks/{task_id}/download
|
|
响应: .tar.gz 压缩包
|
|
|
|
返回文件包含:
|
|
- results/digger/ # Digger 分析结果
|
|
- results/shotter/ # Shoter 评分结果
|
|
- results/logs/ # 执行日志
|
|
- input.fna # 原始输入文件
|
|
```
|
|
|
|
### 删除任务
|
|
|
|
```bash
|
|
DELETE /api/tasks/{task_id}
|
|
```
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
bttoxin-pipeline/
|
|
├── pixi.toml # Pixi environment configuration
|
|
├── pyproject.toml # Python package configuration
|
|
├── scripts/
|
|
│ ├── run_single_fna_pipeline.py # Main orchestrator
|
|
│ ├── run_digger_stage.py # Digger-only stage
|
|
│ ├── bttoxin_shoter.py # Toxin scoring module
|
|
│ ├── plot_shotter.py # Visualization & reporting
|
|
│ ├── start_web.sh # Start both frontend + backend
|
|
│ └── pixi_runner.py # PixiRunner abstraction
|
|
├── bttoxin/ # Python CLI package
|
|
│ ├── api.py # Python API
|
|
│ ├── cli.py # CLI entry point
|
|
│ └── __init__.py
|
|
├── web/backend/ # FastAPI backend
|
|
│ ├── main.py # FastAPI app entry + API endpoints
|
|
│ ├── config.py # Configuration
|
|
│ ├── models.py # Data models
|
|
│ ├── storage.py # Redis + file storage
|
|
│ ├── tasks.py # Task execution logic
|
|
│ └── AGENTS.md # Backend-specific guide
|
|
├── frontend/ # Vue 3 frontend
|
|
│ ├── src/
|
|
│ │ ├── api/task.ts # Task API client
|
|
│ │ ├── views/
|
|
│ │ │ ├── TaskSubmitView.vue # Task submission page
|
|
│ │ │ └── TaskMonitorView.vue # Task status page (polling)
|
|
│ │ ├── types/task.ts # Task types
|
|
│ │ └── ...
|
|
│ └── AGENTS.md # Frontend-specific guide
|
|
├── Data/ # Reference data
|
|
│ └── toxicity-data.csv # BPPRC specificity data
|
|
├── external_dbs/ # Optional external database
|
|
│ └── bt_toxin/ # Updated BtToxin database
|
|
├── tools/ # Utility tools and environments
|
|
│ └── reproduction/ # Reproduction environments
|
|
│ └── bttoxin_digger/ # BtToxin_Digger reproduction env
|
|
├── tests/ # Test suite
|
|
│ ├── test_pixi_runner.py # Property-based tests
|
|
│ └── test_data/ # Test input files (.fna)
|
|
└── docs/ # Documentation
|
|
```
|
|
|
|
## Web API Endpoints
|
|
|
|
### Create Task
|
|
|
|
```bash
|
|
POST /api/tasks
|
|
Content-Type: multipart/form-data
|
|
|
|
Parameters:
|
|
- file: .fna file
|
|
- min_identity: float (0-1, default: 0.8)
|
|
- min_coverage: float (0-1, default: 0.6)
|
|
- allow_unknown_families: boolean (default: false)
|
|
- require_index_hit: boolean (default: true)
|
|
- lang: "zh" | "en" (default: "zh") - *Now supported via Accept-Language header*
|
|
|
|
Response:
|
|
{
|
|
"task_id": "uuid",
|
|
"token": "access_token",
|
|
"status": "pending",
|
|
"created_at": "2024-01-01T00:00:00",
|
|
"expires_at": "2024-01-31T00:00:00",
|
|
"estimated_duration_seconds": 120
|
|
}
|
|
```
|
|
|
|
### Get Task Status
|
|
|
|
```bash
|
|
GET /api/tasks/{task_id}
|
|
|
|
Response:
|
|
{
|
|
"task_id": "uuid",
|
|
"status": "running",
|
|
"progress": 45,
|
|
"current_stage": "shoter",
|
|
"submission_time": "2024-01-01T00:00:00",
|
|
"start_time": "2024-01-01T00:00:10",
|
|
"filename": "sample.fna",
|
|
"error": null,
|
|
"estimated_remaining_seconds": 60
|
|
}
|
|
```
|
|
|
|
### Download Result
|
|
|
|
```bash
|
|
GET /api/tasks/{task_id}/download
|
|
Response: .tar.gz file
|
|
```
|
|
|
|
### Delete Task
|
|
|
|
```bash
|
|
DELETE /api/tasks/{task_id}
|
|
```
|
|
|
|
## Development Commands
|
|
|
|
### Via pixi (Recommended)
|
|
|
|
```bash
|
|
# Full pipeline (uses pipeline environment)
|
|
pixi run -e pipeline pipeline --fna <file.fna>
|
|
|
|
# Individual stages
|
|
pixi run -e pipeline digger-only --fna <file.fna>
|
|
pixi run -e pipeline shotter --all_toxins <path>
|
|
pixi run -e pipeline plot --strain_scores <path>
|
|
|
|
# Frontend
|
|
pixi run fe-install
|
|
pixi run fe-dev # http://localhost:5173
|
|
pixi run fe-build
|
|
|
|
# Backend
|
|
pixi run api-dev # http://localhost:8000
|
|
pixi run api-test
|
|
|
|
# Combined (both frontend + backend)
|
|
pixi run web-start
|
|
|
|
# Tests
|
|
pixi run -e pipeline python -m pytest tests/test_pixi_runner.py -v
|
|
```
|
|
|
|
### Direct Commands
|
|
|
|
```bash
|
|
# Frontend (in frontend/ directory)
|
|
pnpm install
|
|
pnpm dev --host
|
|
|
|
# Backend (in project root)
|
|
uvicorn web.backend.main:app --reload --host 0.0.0.0 --port 8000
|
|
|
|
# Pipeline (requires pipeline environment activation)
|
|
source ~/.pixi/bin/pixi shell-hook -e pipeline > /tmp/activate.sh
|
|
source /tmp/activate.sh
|
|
python scripts/run_single_fna_pipeline.py --fna <file>
|
|
```
|
|
|
|
## Docker Deployment
|
|
|
|
```bash
|
|
# Build and run with docker-compose
|
|
docker-compose -f docker-compose.simple.yml up -d
|
|
|
|
# Access at http://localhost:80
|
|
# API health check: http://localhost:80/api/health
|
|
```
|
|
|
|
### Docker Architecture
|
|
|
|
```
|
|
bttoxin-pipeline (Stack)
|
|
├── traefik (reverse proxy, port 80/443)
|
|
├── bttoxin-pipeline (FastAPI + Static Files, port 8000)
|
|
├── bttoxin-postgres (Database, port 5432)
|
|
└── bttoxin-redis (Task Queue, port 6379)
|
|
```
|
|
|
|
### Docker Volume Mounts
|
|
|
|
| Host Path | Container Path | Purpose |
|
|
|-----------|----------------|---------|
|
|
| `./jobs` | `/app/jobs` | Task results |
|
|
| `postgres_data` | `/var/lib/postgresql/data` | Database persistence |
|
|
| ... | ... | Source code mounts (dev) |
|
|
|
|
## Task Flow
|
|
|
|
```
|
|
1. User uploads .fna file via web UI
|
|
2. Backend creates task directory: /data/jobs/{task_id}/ (or ./jobs/ in dev)
|
|
3. Backend saves input file and parameters
|
|
4. Backend starts `pixi run -e pipeline pipeline` in background (asyncio subprocess)
|
|
5. Frontend polls GET /api/tasks/{task_id} every 3 seconds
|
|
6. On completion, download URL is provided
|
|
7. Results available for 30 days, then auto-cleanup
|
|
```
|
|
|
|
## Result Storage
|
|
|
|
```
|
|
./jobs/{task_id}/ # Or /app/jobs/ in Docker
|
|
├── input.fna # Uploaded file
|
|
├── params.json # Task parameters
|
|
├── task_meta.json # Task metadata (status, progress, etc.)
|
|
├── output/ # Pipeline output
|
|
│ ├── digger/ # BtToxin_Digger results
|
|
│ │ ├── Results/Toxins/
|
|
│ │ │ └── All_Toxins.txt # Toxin hits (input to shotter)
|
|
│ │ └── ...
|
|
│ ├── shotter/ # Shoter scoring results
|
|
│ │ ├── toxin_support.tsv
|
|
│ │ ├── strain_target_scores.tsv
|
|
│ │ └── strain_scores.json
|
|
│ └── logs/
|
|
└── pipeline_results.tar.gz # Downloadable bundle
|
|
```
|
|
|
|
## Common Tasks
|
|
|
|
### Adding a New Pipeline Stage
|
|
|
|
1. Create script in `scripts/`
|
|
2. Add to `run_single_fna_pipeline.py` orchestration
|
|
3. Register task in `pixi.toml` if standalone execution needed
|
|
4. Add stage definition to `frontend/src/types/task.ts`
|
|
|
|
### Modifying Task Parameters
|
|
|
|
1. Update `TaskFormData` interface in `frontend/src/components/task/TaskSubmitForm.vue`
|
|
2. Update API endpoint in `web/backend/main.py`
|
|
3. Update task execution in `web/backend/tasks.py`
|
|
|
|
### Configuring Storage Location
|
|
|
|
```bash
|
|
# Set custom jobs directory
|
|
export JOBS_DIR=/path/to/jobs
|
|
|
|
# Or modify pixi.toml [feature.webbackend.env] section
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
|
|
| Variable | Description | Default |
|
|
|----------|-------------|---------|
|
|
| `VITE_API_BASE_URL` | Frontend API URL (production) | "" (uses relative path) |
|
|
| `JOBS_DIR` | Task storage directory | ./jobs |
|
|
| `DEBUG` | Enable debug mode | false |
|
|
|
|
### Key Files Modified (Recent Fixes)
|
|
|
|
| File | Change |
|
|
|------|--------|
|
|
| `scripts/bttoxin_shoter.py` | Added `engine="python"` for pandas 2.x compatibility; Added empty DataFrame handling |
|
|
| `scripts/run_single_fna_pipeline.py` | Fixed `pixi_runner` import with `sys.path.insert()` |
|
|
| `web/backend/tasks.py` | Changed to `pixi run -e pipeline pipeline` command |
|
|
| `entrypoint.sh` | Fixed nginx `proxy_pass` to preserve `/api/` prefix |
|
|
| `docker-compose.simple.yml` | Docker deployment configuration |
|
|
|
|
### Constraints
|
|
|
|
Defined in `web/backend/config.py`:
|
|
- Max upload size: 50 MB
|
|
- Result retention: 30 days
|
|
- Task timeout: 6 hours
|
|
- Allowed extensions: .fna, .fa, .fasta
|
|
|
|
## Testing
|
|
|
|
### Python Tests
|
|
|
|
```bash
|
|
# Property-based tests for pipeline
|
|
pixi run -e pipeline python -m pytest tests/test_pixi_runner.py -v
|
|
|
|
# Backend tests
|
|
pixi run api-test
|
|
```
|
|
|
|
### Frontend Tests
|
|
|
|
```bash
|
|
pixi run fe-test
|
|
# or
|
|
cd frontend && pnpm test:unit
|
|
```
|
|
|
|
## Database Update
|
|
|
|
```bash
|
|
mkdir -p external_dbs
|
|
git clone --filter=blob:none --no-checkout \
|
|
https://github.com/liaochenlanruo/BtToxin_Digger.git tmp_bttoxin_repo
|
|
cd tmp_bttoxin_repo
|
|
git sparse-checkout init --cone
|
|
git sparse-checkout set BTTCMP_db/bt_toxin
|
|
git checkout master
|
|
cd ..
|
|
cp -a tmp_bttoxin_repo/BTTCMP_db/bt_toxin external_dbs/bt_toxin
|
|
rm -rf tmp_bttoxin_repo
|
|
```
|
|
|
|
## Subproject Guides
|
|
|
|
- **Frontend**: See [`frontend/AGENTS.md`](frontend/AGENTS.md)
|
|
- **Backend**: See [`web/backend/AGENTS.md`](web/backend/AGENTS.md)
|
|
- **API Documentation**: http://localhost:8000/api/docs (when DEBUG=true)
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
| Issue | Solution |
|
|
|-------|----------|
|
|
| pixi not found | `export PATH="$HOME/.pixi/bin:$PATH"` |
|
|
| Environment not found | `pixi install` |
|
|
| BtToxin_Digger unavailable | `pixi run -e digger BtToxin_Digger --help` |
|
|
| Permission denied | Ensure write access to `/data/jobs` |
|
|
| Task not found | Check task_id in URL and response |
|
|
| Results expired | Results auto-delete after 30 days |
|
|
| Nginx 404 on API | Check `proxy_pass http://127.0.0.1:8000/api/` (note trailing `/api/`) |
|
|
| KeyError: 'Strain' | Empty DataFrame after filters - shotter now handles this gracefully |
|
|
| Pandas engine error | Use `engine="python"` in `pd.read_csv()` for pandas 2.x |
|
|
|
|
### Debugging Pipeline Issues
|
|
|
|
```bash
|
|
# Check if task was created
|
|
curl http://localhost:80/api/tasks/{task_id}
|
|
|
|
# View task logs
|
|
cat jobs/{task_id}/output/logs/digger_execution.log
|
|
|
|
# Check All_Toxins.txt format
|
|
head -1 jobs/{task_id}/output/digger/Results/Toxins/All_Toxins.txt
|
|
|
|
# Test shotter independently
|
|
pixi run -e pipeline python scripts/bttoxin_shoter.py \
|
|
--all_toxins jobs/{task_id}/output/digger/Results/Toxins/All_Toxins.txt \
|
|
--output_dir /tmp/test_output
|
|
```
|
|
|
|
### Docker-Specific Issues
|
|
|
|
```bash
|
|
# Check container health
|
|
docker ps
|
|
docker logs bttoxin-pipeline
|
|
|
|
# Check nginx config
|
|
docker exec bttoxin-pipeline nginx -T
|
|
|
|
# Verify backend is running
|
|
docker exec bttoxin-pipeline curl http://127.0.0.1:8000/api/health
|
|
```
|
|
|
|
### Post-Mortem: Startup Failures & 404/403 Errors (2026-01 Update)
|
|
|
|
**Symptoms:**
|
|
- Website returns 404 Not Found or 403 Forbidden.
|
|
- Container stuck in `Restarting` loop.
|
|
- Logs show `exec: "uvicorn": executable file not found`.
|
|
|
|
**Root Causes & Solutions:**
|
|
|
|
1. **Missing Environment Config**:
|
|
- **Cause**: `pixi.toml` and `pixi.lock` were missing in the final Docker image phase.
|
|
- **Fix**: Ensure `COPY --from=builder /app/pixi.toml ...` is present in Dockerfile.
|
|
|
|
2. **Port Conflict**:
|
|
- **Cause**: `docker-compose.yml` mapped `80:80` while Traefik already occupied port 80.
|
|
- **Fix**: Remove `ports` mapping in compose file; rely on Docker internal network (`frontend`) and Traefik labels.
|
|
|
|
3. **Frontend Permissions**:
|
|
- **Cause**: Built frontend files owned by root were not readable by Nginx user.
|
|
- **Fix**: Add `RUN chmod -R 755 /var/www/html` in Dockerfile.
|
|
|
|
4. **Health Check Path**:
|
|
- **Cause**: Nginx routed `/health` to `/api/health` but backend expected `/health`.
|
|
- **Fix**: Update Nginx config to proxy pass to correct endpoint.
|
|
|
|
### Post-Mortem: Consistency Refactoring & Fixes (2026-01-20 Update)
|
|
|
|
**Summary:**
|
|
Major refactoring to ensure consistency between script execution and web pipeline, fix severe container startup failures, and simplify user experience.
|
|
|
|
**1. Unified Pipeline Execution**
|
|
- **Problem**: Web backend manually orchestrated pipeline steps, leading to discrepancies with the standalone script (e.g., missing plots, different file formats).
|
|
- **Fix**: Refactored `backend/app/workers/tasks.py` to directly subprocess `scripts/run_single_fna_pipeline.py`.
|
|
- **Result**: Web output is now guaranteed identical to manual script execution.
|
|
|
|
**2. Result Format & Cleanup**
|
|
- **Change**: Switched output format from `.tar.gz` to `.zip`.
|
|
- **Feature**: Added automatic cleanup of intermediate directories (`digger/`, `shoter/`) to save disk space; only the final ZIP and logs are retained.
|
|
- **Frontend**: Updated download logic to handle `.zip` files.
|
|
|
|
**3. Frontend Simplification**
|
|
- **Change**: Removed CRISPR Fusion UI elements (beta feature) to reduce complexity.
|
|
- **Change**: Replaced complex multi-stage status indicators with a "Simulated Progress Bar" for better UX during black-box script execution.
|
|
- **Fix**: Restored "One-click load" button and fixed TypeScript build errors caused by removed variables.
|
|
|
|
**4. Critical Docker Fixes**
|
|
- **Fix (Restart Loop)**: Removed incorrect `image: postgres` directive in `docker-compose.yml` that caused the web service to run database software instead of the app.
|
|
- **Fix (Env Path)**: Updated `.dockerignore` to exclude host `.pixi` directory, preventing "bad interpreter" errors caused by hardcoded host paths in the container.
|
|
- **Fix (404 Error)**: Removed erroneous `rm -rf /app/frontend` in Dockerfile that was accidentally deleting built frontend assets.
|
|
- **Optimization**: Configured `npmmirror` registry to resolve build timeouts in CN network environments.
|