Files
bttoxin-pipeline/AGENTS.md
zly e44692600c Fix(pipeline): optimize docker build, fix zip structure, and update UI
- Docker:
  - Explicitly install pixi environments (digger, pipeline, webbackend) during build to prevent runtime network/DNS failures.
  - Optimize pnpm config (copy method) to fix EAGAIN errors.
- Backend:
  - Refactor ZIP bundling: use flat semantic directories (1_Toxin_Mining, etc.).
  - Fix "nested zip" issue by cleaning existing archives before bundling.
  - Exclude raw 'context' directory from final download.
- Frontend:
  - Update TutorialView documentation to match new result structure.
  - Improve TaskMonitor progress bar precision (1 decimal place).
  - Update i18n (en/zh) for new file descriptions.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-21 20:43:28 +08:00

528 lines
16 KiB
Markdown

# BtToxin Pipeline Agent Guide
## Overview
BtToxin Pipeline is an automated Bacillus thuringiensis toxin mining system. It identifies Cry toxin genes in bacterial genomes and predicts their target insects using a three-stage pipeline:
1. **Digger**: BtToxin_Digger toxin mining
2. **Shotter**: Toxin scoring and target prediction
3. **Plot**: Heatmap generation and report creation
## Tech Stack
| Layer | Technology |
|-------|------------|
| Package Manager | pixi (conda environments) |
| Pipeline | Python 3.9+ (pandas, matplotlib, seaborn) |
| Digger Tool | BtToxin_Digger (Perl, BLAST, HMMER) |
| Frontend | Vue 3 + Vite + Element Plus + vue-i18n |
| Backend | FastAPI + Uvicorn + SQLAlchemy |
| Database | PostgreSQL 15 (Metadata) + Redis 7 (Queue) |
| Result Storage | File system + 30-day retention |
## Quick Start
```bash
# 1. Clone and install dependencies
git clone <repo>
cd bttoxin-pipeline
pixi install
# 2. Start services (Production with Traefik)
# Using the unified production configuration
docker compose -f docker/compose/docker-compose.traefik.yml -p compose up -d --build
# Access:
# Frontend: https://bttiaw.hzau.edu.cn (via Traefik)
# Backend API: http://bttoxin-pipeline:8000 (Internal)
# 3. Development Mode (Local)
pixi run web-start
```
# - 上传 .fna 基因组文件
# - 配置参数
# - 点击提交
# - 自动跳转到 /{task_id} 页面
# - 页面每 3 秒自动刷新查看进度
# - 完成后点击下载结果压缩包
```
## 任务提交流程
```
1. 用户在首页上传 .fna 文件
2. 点击"提交任务"按钮
3. 后端创建任务,返回 task_id
4. 前端自动跳转到 /{task_id} 页面
5. 页面每 3 秒轮询后端获取最新状态
6. 显示:进度条、当前阶段、预计剩余时间
7. 完成后显示"下载分析结果"按钮
8. 结果保留 30 天后自动删除
```
## Web API 接口
### 创建任务
```bash
POST /api/tasks
Content-Type: multipart/form-data
参数:
- file: .fna 文件
- min_identity: 最小相似度 (0-1, 默认: 0.8)
- min_coverage: 最小覆盖度 (0-1, 默认: 0.6)
- allow_unknown_families: 是否允许未知家族 (默认: false)
- require_index_hit: 是否需要索引命中 (默认: true)
- lang: 语言 "zh" | "en" (默认: "zh")
响应:
{
"task_id": "uuid",
"token": "访问令牌",
"status": "pending", // pending | running | completed | failed
"created_at": "创建时间",
"expires_at": "过期时间",
"estimated_duration_seconds": 预估耗时(秒)
}
```
### 查询任务状态
```bash
GET /api/tasks/{task_id}
响应:
{
"task_id": "uuid",
"status": "running",
"progress": 45, // 进度百分比
"current_stage": "shoter", // 当前阶段: digger | shoter | plots | bundle
"submission_time": "提交时间",
"start_time": "开始时间",
"filename": "原始文件名",
"error": null, // 失败时的错误信息
"estimated_remaining_seconds": 60 // 预估剩余时间
}
```
### 下载结果
```bash
GET /api/tasks/{task_id}/download
响应: .tar.gz 压缩包
返回文件包含:
- results/digger/ # Digger 分析结果
- results/shotter/ # Shoter 评分结果
- results/logs/ # 执行日志
- input.fna # 原始输入文件
```
### 删除任务
```bash
DELETE /api/tasks/{task_id}
```
## Project Structure
```
bttoxin-pipeline/
├── pixi.toml # Pixi environment configuration
├── pyproject.toml # Python package configuration
├── scripts/
│ ├── run_single_fna_pipeline.py # Main orchestrator
│ ├── run_digger_stage.py # Digger-only stage
│ ├── bttoxin_shoter.py # Toxin scoring module
│ ├── plot_shotter.py # Visualization & reporting
│ ├── start_web.sh # Start both frontend + backend
│ └── pixi_runner.py # PixiRunner abstraction
├── bttoxin/ # Python CLI package
│ ├── api.py # Python API
│ ├── cli.py # CLI entry point
│ └── __init__.py
├── web/backend/ # FastAPI backend
│ ├── main.py # FastAPI app entry + API endpoints
│ ├── config.py # Configuration
│ ├── models.py # Data models
│ ├── storage.py # Redis + file storage
│ ├── tasks.py # Task execution logic
│ └── AGENTS.md # Backend-specific guide
├── frontend/ # Vue 3 frontend
│ ├── src/
│ │ ├── api/task.ts # Task API client
│ │ ├── views/
│ │ │ ├── TaskSubmitView.vue # Task submission page
│ │ │ └── TaskMonitorView.vue # Task status page (polling)
│ │ ├── types/task.ts # Task types
│ │ └── ...
│ └── AGENTS.md # Frontend-specific guide
├── Data/ # Reference data
│ └── toxicity-data.csv # BPPRC specificity data
├── external_dbs/ # Optional external database
│ └── bt_toxin/ # Updated BtToxin database
├── tools/ # Utility tools and environments
│ └── reproduction/ # Reproduction environments
│ └── bttoxin_digger/ # BtToxin_Digger reproduction env
├── tests/ # Test suite
│ ├── test_pixi_runner.py # Property-based tests
│ └── test_data/ # Test input files (.fna)
└── docs/ # Documentation
```
## Web API Endpoints
### Create Task
```bash
POST /api/tasks
Content-Type: multipart/form-data
Parameters:
- file: .fna file
- min_identity: float (0-1, default: 0.8)
- min_coverage: float (0-1, default: 0.6)
- allow_unknown_families: boolean (default: false)
- require_index_hit: boolean (default: true)
- lang: "zh" | "en" (default: "zh") - *Now supported via Accept-Language header*
Response:
{
"task_id": "uuid",
"token": "access_token",
"status": "pending",
"created_at": "2024-01-01T00:00:00",
"expires_at": "2024-01-31T00:00:00",
"estimated_duration_seconds": 120
}
```
### Get Task Status
```bash
GET /api/tasks/{task_id}
Response:
{
"task_id": "uuid",
"status": "running",
"progress": 45,
"current_stage": "shoter",
"submission_time": "2024-01-01T00:00:00",
"start_time": "2024-01-01T00:00:10",
"filename": "sample.fna",
"error": null,
"estimated_remaining_seconds": 60
}
```
### Download Result
```bash
GET /api/tasks/{task_id}/download
Response: .tar.gz file
```
### Delete Task
```bash
DELETE /api/tasks/{task_id}
```
## Development Commands
### Via pixi (Recommended)
```bash
# Full pipeline (uses pipeline environment)
pixi run -e pipeline pipeline --fna <file.fna>
# Individual stages
pixi run -e pipeline digger-only --fna <file.fna>
pixi run -e pipeline shotter --all_toxins <path>
pixi run -e pipeline plot --strain_scores <path>
# Frontend
pixi run fe-install
pixi run fe-dev # http://localhost:5173
pixi run fe-build
# Backend
pixi run api-dev # http://localhost:8000
pixi run api-test
# Combined (both frontend + backend)
pixi run web-start
# Tests
pixi run -e pipeline python -m pytest tests/test_pixi_runner.py -v
```
### Direct Commands
```bash
# Frontend (in frontend/ directory)
pnpm install
pnpm dev --host
# Backend (in project root)
uvicorn web.backend.main:app --reload --host 0.0.0.0 --port 8000
# Pipeline (requires pipeline environment activation)
source ~/.pixi/bin/pixi shell-hook -e pipeline > /tmp/activate.sh
source /tmp/activate.sh
python scripts/run_single_fna_pipeline.py --fna <file>
```
## Docker Deployment
```bash
# Build and run with docker-compose
docker-compose -f docker-compose.simple.yml up -d
# Access at http://localhost:80
# API health check: http://localhost:80/api/health
```
### Docker Architecture
```
bttoxin-pipeline (Stack)
├── traefik (reverse proxy, port 80/443)
├── bttoxin-pipeline (FastAPI + Static Files, port 8000)
├── bttoxin-postgres (Database, port 5432)
└── bttoxin-redis (Task Queue, port 6379)
```
### Docker Volume Mounts
| Host Path | Container Path | Purpose |
|-----------|----------------|---------|
| `./jobs` | `/app/jobs` | Task results |
| `postgres_data` | `/var/lib/postgresql/data` | Database persistence |
| ... | ... | Source code mounts (dev) |
## Task Flow
```
1. User uploads .fna file via web UI
2. Backend creates task directory: /data/jobs/{task_id}/ (or ./jobs/ in dev)
3. Backend saves input file and parameters
4. Backend starts `pixi run -e pipeline pipeline` in background (asyncio subprocess)
5. Frontend polls GET /api/tasks/{task_id} every 3 seconds
6. On completion, download URL is provided
7. Results available for 30 days, then auto-cleanup
```
## Result Storage
```
./jobs/{task_id}/ # Or /app/jobs/ in Docker
├── input.fna # Uploaded file
├── params.json # Task parameters
├── task_meta.json # Task metadata (status, progress, etc.)
├── output/ # Pipeline output
│ ├── digger/ # BtToxin_Digger results
│ │ ├── Results/Toxins/
│ │ │ └── All_Toxins.txt # Toxin hits (input to shotter)
│ │ └── ...
│ ├── shotter/ # Shoter scoring results
│ │ ├── toxin_support.tsv
│ │ ├── strain_target_scores.tsv
│ │ └── strain_scores.json
│ └── logs/
└── pipeline_results.tar.gz # Downloadable bundle
```
## Common Tasks
### Adding a New Pipeline Stage
1. Create script in `scripts/`
2. Add to `run_single_fna_pipeline.py` orchestration
3. Register task in `pixi.toml` if standalone execution needed
4. Add stage definition to `frontend/src/types/task.ts`
### Modifying Task Parameters
1. Update `TaskFormData` interface in `frontend/src/components/task/TaskSubmitForm.vue`
2. Update API endpoint in `web/backend/main.py`
3. Update task execution in `web/backend/tasks.py`
### Configuring Storage Location
```bash
# Set custom jobs directory
export JOBS_DIR=/path/to/jobs
# Or modify pixi.toml [feature.webbackend.env] section
```
## Configuration
### Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| `VITE_API_BASE_URL` | Frontend API URL (production) | "" (uses relative path) |
| `JOBS_DIR` | Task storage directory | ./jobs |
| `DEBUG` | Enable debug mode | false |
### Key Files Modified (Recent Fixes)
| File | Change |
|------|--------|
| `scripts/bttoxin_shoter.py` | Added `engine="python"` for pandas 2.x compatibility; Added empty DataFrame handling |
| `scripts/run_single_fna_pipeline.py` | Fixed `pixi_runner` import with `sys.path.insert()` |
| `web/backend/tasks.py` | Changed to `pixi run -e pipeline pipeline` command |
| `entrypoint.sh` | Fixed nginx `proxy_pass` to preserve `/api/` prefix |
| `docker-compose.simple.yml` | Docker deployment configuration |
### Constraints
Defined in `web/backend/config.py`:
- Max upload size: 50 MB
- Result retention: 30 days
- Task timeout: 6 hours
- Allowed extensions: .fna, .fa, .fasta
## Testing
### Python Tests
```bash
# Property-based tests for pipeline
pixi run -e pipeline python -m pytest tests/test_pixi_runner.py -v
# Backend tests
pixi run api-test
```
### Frontend Tests
```bash
pixi run fe-test
# or
cd frontend && pnpm test:unit
```
## Database Update
```bash
mkdir -p external_dbs
git clone --filter=blob:none --no-checkout \
https://github.com/liaochenlanruo/BtToxin_Digger.git tmp_bttoxin_repo
cd tmp_bttoxin_repo
git sparse-checkout init --cone
git sparse-checkout set BTTCMP_db/bt_toxin
git checkout master
cd ..
cp -a tmp_bttoxin_repo/BTTCMP_db/bt_toxin external_dbs/bt_toxin
rm -rf tmp_bttoxin_repo
```
## Subproject Guides
- **Frontend**: See [`frontend/AGENTS.md`](frontend/AGENTS.md)
- **Backend**: See [`web/backend/AGENTS.md`](web/backend/AGENTS.md)
- **API Documentation**: http://localhost:8000/api/docs (when DEBUG=true)
## Troubleshooting
### Common Issues
| Issue | Solution |
|-------|----------|
| pixi not found | `export PATH="$HOME/.pixi/bin:$PATH"` |
| Environment not found | `pixi install` |
| BtToxin_Digger unavailable | `pixi run -e digger BtToxin_Digger --help` |
| Permission denied | Ensure write access to `/data/jobs` |
| Task not found | Check task_id in URL and response |
| Results expired | Results auto-delete after 30 days |
| Nginx 404 on API | Check `proxy_pass http://127.0.0.1:8000/api/` (note trailing `/api/`) |
| KeyError: 'Strain' | Empty DataFrame after filters - shotter now handles this gracefully |
| Pandas engine error | Use `engine="python"` in `pd.read_csv()` for pandas 2.x |
### Debugging Pipeline Issues
```bash
# Check if task was created
curl http://localhost:80/api/tasks/{task_id}
# View task logs
cat jobs/{task_id}/output/logs/digger_execution.log
# Check All_Toxins.txt format
head -1 jobs/{task_id}/output/digger/Results/Toxins/All_Toxins.txt
# Test shotter independently
pixi run -e pipeline python scripts/bttoxin_shoter.py \
--all_toxins jobs/{task_id}/output/digger/Results/Toxins/All_Toxins.txt \
--output_dir /tmp/test_output
```
### Docker-Specific Issues
```bash
# Check container health
docker ps
docker logs bttoxin-pipeline
# Check nginx config
docker exec bttoxin-pipeline nginx -T
# Verify backend is running
docker exec bttoxin-pipeline curl http://127.0.0.1:8000/api/health
```
### Post-Mortem: Startup Failures & 404/403 Errors (2026-01 Update)
**Symptoms:**
- Website returns 404 Not Found or 403 Forbidden.
- Container stuck in `Restarting` loop.
- Logs show `exec: "uvicorn": executable file not found`.
**Root Causes & Solutions:**
1. **Missing Environment Config**:
- **Cause**: `pixi.toml` and `pixi.lock` were missing in the final Docker image phase.
- **Fix**: Ensure `COPY --from=builder /app/pixi.toml ...` is present in Dockerfile.
2. **Port Conflict**:
- **Cause**: `docker-compose.yml` mapped `80:80` while Traefik already occupied port 80.
- **Fix**: Remove `ports` mapping in compose file; rely on Docker internal network (`frontend`) and Traefik labels.
3. **Frontend Permissions**:
- **Cause**: Built frontend files owned by root were not readable by Nginx user.
- **Fix**: Add `RUN chmod -R 755 /var/www/html` in Dockerfile.
4. **Health Check Path**:
- **Cause**: Nginx routed `/health` to `/api/health` but backend expected `/health`.
- **Fix**: Update Nginx config to proxy pass to correct endpoint.
### Post-Mortem: Consistency Refactoring & Fixes (2026-01-20 Update)
**Summary:**
Major refactoring to ensure consistency between script execution and web pipeline, fix severe container startup failures, and simplify user experience.
**1. Unified Pipeline Execution**
- **Problem**: Web backend manually orchestrated pipeline steps, leading to discrepancies with the standalone script (e.g., missing plots, different file formats).
- **Fix**: Refactored `backend/app/workers/tasks.py` to directly subprocess `scripts/run_single_fna_pipeline.py`.
- **Result**: Web output is now guaranteed identical to manual script execution.
**2. Result Format & Cleanup**
- **Change**: Switched output format from `.tar.gz` to `.zip`.
- **Feature**: Added automatic cleanup of intermediate directories (`digger/`, `shoter/`) to save disk space; only the final ZIP and logs are retained.
- **Frontend**: Updated download logic to handle `.zip` files.
**3. Frontend Simplification**
- **Change**: Removed CRISPR Fusion UI elements (beta feature) to reduce complexity.
- **Change**: Replaced complex multi-stage status indicators with a "Simulated Progress Bar" for better UX during black-box script execution.
- **Fix**: Restored "One-click load" button and fixed TypeScript build errors caused by removed variables.
**4. Critical Docker Fixes**
- **Fix (Restart Loop)**: Removed incorrect `image: postgres` directive in `docker-compose.yml` that caused the web service to run database software instead of the app.
- **Fix (Env Path)**: Updated `.dockerignore` to exclude host `.pixi` directory, preventing "bad interpreter" errors caused by hardcoded host paths in the container.
- **Fix (404 Error)**: Removed erroneous `rm -rf /app/frontend` in Dockerfile that was accidentally deleting built frontend assets.
- **Optimization**: Configured `npmmirror` registry to resolve build timeouts in CN network environments.