Files
bttoxin-pipeline/AGENTS.md
zly e44692600c Fix(pipeline): optimize docker build, fix zip structure, and update UI
- Docker:
  - Explicitly install pixi environments (digger, pipeline, webbackend) during build to prevent runtime network/DNS failures.
  - Optimize pnpm config (copy method) to fix EAGAIN errors.
- Backend:
  - Refactor ZIP bundling: use flat semantic directories (1_Toxin_Mining, etc.).
  - Fix "nested zip" issue by cleaning existing archives before bundling.
  - Exclude raw 'context' directory from final download.
- Frontend:
  - Update TutorialView documentation to match new result structure.
  - Improve TaskMonitor progress bar precision (1 decimal place).
  - Update i18n (en/zh) for new file descriptions.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-21 20:43:28 +08:00

16 KiB

BtToxin Pipeline Agent Guide

Overview

BtToxin Pipeline is an automated Bacillus thuringiensis toxin mining system. It identifies Cry toxin genes in bacterial genomes and predicts their target insects using a three-stage pipeline:

  1. Digger: BtToxin_Digger toxin mining
  2. Shotter: Toxin scoring and target prediction
  3. Plot: Heatmap generation and report creation

Tech Stack

Layer Technology
Package Manager pixi (conda environments)
Pipeline Python 3.9+ (pandas, matplotlib, seaborn)
Digger Tool BtToxin_Digger (Perl, BLAST, HMMER)
Frontend Vue 3 + Vite + Element Plus + vue-i18n
Backend FastAPI + Uvicorn + SQLAlchemy
Database PostgreSQL 15 (Metadata) + Redis 7 (Queue)
Result Storage File system + 30-day retention

Quick Start

# 1. Clone and install dependencies
git clone <repo>
cd bttoxin-pipeline
pixi install

# 2. Start services (Production with Traefik)
# Using the unified production configuration
docker compose -f docker/compose/docker-compose.traefik.yml -p compose up -d --build

# Access:
# Frontend: https://bttiaw.hzau.edu.cn (via Traefik)
# Backend API: http://bttoxin-pipeline:8000 (Internal)

# 3. Development Mode (Local)
pixi run web-start

- 上传 .fna 基因组文件

- 配置参数

- 点击提交

- 自动跳转到 /{task_id} 页面

- 页面每 3 秒自动刷新查看进度

- 完成后点击下载结果压缩包


## 任务提交流程

  1. 用户在首页上传 .fna 文件
  2. 点击"提交任务"按钮
  3. 后端创建任务,返回 task_id
  4. 前端自动跳转到 /{task_id} 页面
  5. 页面每 3 秒轮询后端获取最新状态
  6. 显示:进度条、当前阶段、预计剩余时间
  7. 完成后显示"下载分析结果"按钮
  8. 结果保留 30 天后自动删除

## Web API 接口

### 创建任务

```bash
POST /api/tasks
Content-Type: multipart/form-data

参数:
- file: .fna 文件
- min_identity: 最小相似度 (0-1, 默认: 0.8)
- min_coverage: 最小覆盖度 (0-1, 默认: 0.6)
- allow_unknown_families: 是否允许未知家族 (默认: false)
- require_index_hit: 是否需要索引命中 (默认: true)
- lang: 语言 "zh" | "en" (默认: "zh")

响应:
{
  "task_id": "uuid",
  "token": "访问令牌",
  "status": "pending",  // pending | running | completed | failed
  "created_at": "创建时间",
  "expires_at": "过期时间",
  "estimated_duration_seconds": 预估耗时(秒)
}

查询任务状态

GET /api/tasks/{task_id}

响应:
{
  "task_id": "uuid",
  "status": "running",
  "progress": 45,           // 进度百分比
  "current_stage": "shoter", // 当前阶段: digger | shoter | plots | bundle
  "submission_time": "提交时间",
  "start_time": "开始时间",
  "filename": "原始文件名",
  "error": null,            // 失败时的错误信息
  "estimated_remaining_seconds": 60  // 预估剩余时间
}

下载结果

GET /api/tasks/{task_id}/download
响应: .tar.gz 压缩包

返回文件包含:
- results/digger/         # Digger 分析结果
- results/shotter/        # Shoter 评分结果
- results/logs/           # 执行日志
- input.fna               # 原始输入文件

删除任务

DELETE /api/tasks/{task_id}

Project Structure

bttoxin-pipeline/
├── pixi.toml                    # Pixi environment configuration
├── pyproject.toml               # Python package configuration
├── scripts/
│   ├── run_single_fna_pipeline.py    # Main orchestrator
│   ├── run_digger_stage.py           # Digger-only stage
│   ├── bttoxin_shoter.py             # Toxin scoring module
│   ├── plot_shotter.py               # Visualization & reporting
│   ├── start_web.sh                  # Start both frontend + backend
│   └── pixi_runner.py                # PixiRunner abstraction
├── bttoxin/                     # Python CLI package
│   ├── api.py                   # Python API
│   ├── cli.py                   # CLI entry point
│   └── __init__.py
├── web/backend/                 # FastAPI backend
│   ├── main.py                  # FastAPI app entry + API endpoints
│   ├── config.py                # Configuration
│   ├── models.py                # Data models
│   ├── storage.py               # Redis + file storage
│   ├── tasks.py                 # Task execution logic
│   └── AGENTS.md                # Backend-specific guide
├── frontend/                    # Vue 3 frontend
│   ├── src/
│   │   ├── api/task.ts          # Task API client
│   │   ├── views/
│   │   │   ├── TaskSubmitView.vue    # Task submission page
│   │   │   └── TaskMonitorView.vue   # Task status page (polling)
│   │   ├── types/task.ts        # Task types
│   │   └── ...
│   └── AGENTS.md                # Frontend-specific guide
├── Data/                        # Reference data
│   └── toxicity-data.csv        # BPPRC specificity data
├── external_dbs/                # Optional external database
│   └── bt_toxin/                # Updated BtToxin database
├── tools/                       # Utility tools and environments
│   └── reproduction/            # Reproduction environments
│       └── bttoxin_digger/      # BtToxin_Digger reproduction env
├── tests/                       # Test suite
│   ├── test_pixi_runner.py      # Property-based tests
│   └── test_data/               # Test input files (.fna)
└── docs/                        # Documentation

Web API Endpoints

Create Task

POST /api/tasks
Content-Type: multipart/form-data

Parameters:
- file: .fna file
- min_identity: float (0-1, default: 0.8)
- min_coverage: float (0-1, default: 0.6)
- allow_unknown_families: boolean (default: false)
- require_index_hit: boolean (default: true)
- lang: "zh" | "en" (default: "zh") - *Now supported via Accept-Language header*

Response:
{
  "task_id": "uuid",
  "token": "access_token",
  "status": "pending",
  "created_at": "2024-01-01T00:00:00",
  "expires_at": "2024-01-31T00:00:00",
  "estimated_duration_seconds": 120
}

Get Task Status

GET /api/tasks/{task_id}

Response:
{
  "task_id": "uuid",
  "status": "running",
  "progress": 45,
  "current_stage": "shoter",
  "submission_time": "2024-01-01T00:00:00",
  "start_time": "2024-01-01T00:00:10",
  "filename": "sample.fna",
  "error": null,
  "estimated_remaining_seconds": 60
}

Download Result

GET /api/tasks/{task_id}/download
Response: .tar.gz file

Delete Task

DELETE /api/tasks/{task_id}

Development Commands

# Full pipeline (uses pipeline environment)
pixi run -e pipeline pipeline --fna <file.fna>

# Individual stages
pixi run -e pipeline digger-only --fna <file.fna>
pixi run -e pipeline shotter --all_toxins <path>
pixi run -e pipeline plot --strain_scores <path>

# Frontend
pixi run fe-install
pixi run fe-dev          # http://localhost:5173
pixi run fe-build

# Backend
pixi run api-dev         # http://localhost:8000
pixi run api-test

# Combined (both frontend + backend)
pixi run web-start

# Tests
pixi run -e pipeline python -m pytest tests/test_pixi_runner.py -v

Direct Commands

# Frontend (in frontend/ directory)
pnpm install
pnpm dev --host

# Backend (in project root)
uvicorn web.backend.main:app --reload --host 0.0.0.0 --port 8000

# Pipeline (requires pipeline environment activation)
source ~/.pixi/bin/pixi shell-hook -e pipeline > /tmp/activate.sh
source /tmp/activate.sh
python scripts/run_single_fna_pipeline.py --fna <file>

Docker Deployment

# Build and run with docker-compose
docker-compose -f docker-compose.simple.yml up -d

# Access at http://localhost:80
# API health check: http://localhost:80/api/health

Docker Architecture

bttoxin-pipeline (Stack)
├── traefik (reverse proxy, port 80/443)
├── bttoxin-pipeline (FastAPI + Static Files, port 8000)
├── bttoxin-postgres (Database, port 5432)
└── bttoxin-redis (Task Queue, port 6379)

Docker Volume Mounts

Host Path Container Path Purpose
./jobs /app/jobs Task results
postgres_data /var/lib/postgresql/data Database persistence
... ... Source code mounts (dev)

Task Flow

1. User uploads .fna file via web UI
2. Backend creates task directory: /data/jobs/{task_id}/ (or ./jobs/ in dev)
3. Backend saves input file and parameters
4. Backend starts `pixi run -e pipeline pipeline` in background (asyncio subprocess)
5. Frontend polls GET /api/tasks/{task_id} every 3 seconds
6. On completion, download URL is provided
7. Results available for 30 days, then auto-cleanup

Result Storage

./jobs/{task_id}/                    # Or /app/jobs/ in Docker
├── input.fna                        # Uploaded file
├── params.json                      # Task parameters
├── task_meta.json                   # Task metadata (status, progress, etc.)
├── output/                          # Pipeline output
│   ├── digger/                      # BtToxin_Digger results
│   │   ├── Results/Toxins/
│   │   │   └── All_Toxins.txt       # Toxin hits (input to shotter)
│   │   └── ...
│   ├── shotter/                     # Shoter scoring results
│   │   ├── toxin_support.tsv
│   │   ├── strain_target_scores.tsv
│   │   └── strain_scores.json
│   └── logs/
└── pipeline_results.tar.gz          # Downloadable bundle

Common Tasks

Adding a New Pipeline Stage

  1. Create script in scripts/
  2. Add to run_single_fna_pipeline.py orchestration
  3. Register task in pixi.toml if standalone execution needed
  4. Add stage definition to frontend/src/types/task.ts

Modifying Task Parameters

  1. Update TaskFormData interface in frontend/src/components/task/TaskSubmitForm.vue
  2. Update API endpoint in web/backend/main.py
  3. Update task execution in web/backend/tasks.py

Configuring Storage Location

# Set custom jobs directory
export JOBS_DIR=/path/to/jobs

# Or modify pixi.toml [feature.webbackend.env] section

Configuration

Environment Variables

Variable Description Default
VITE_API_BASE_URL Frontend API URL (production) "" (uses relative path)
JOBS_DIR Task storage directory ./jobs
DEBUG Enable debug mode false

Key Files Modified (Recent Fixes)

File Change
scripts/bttoxin_shoter.py Added engine="python" for pandas 2.x compatibility; Added empty DataFrame handling
scripts/run_single_fna_pipeline.py Fixed pixi_runner import with sys.path.insert()
web/backend/tasks.py Changed to pixi run -e pipeline pipeline command
entrypoint.sh Fixed nginx proxy_pass to preserve /api/ prefix
docker-compose.simple.yml Docker deployment configuration

Constraints

Defined in web/backend/config.py:

  • Max upload size: 50 MB
  • Result retention: 30 days
  • Task timeout: 6 hours
  • Allowed extensions: .fna, .fa, .fasta

Testing

Python Tests

# Property-based tests for pipeline
pixi run -e pipeline python -m pytest tests/test_pixi_runner.py -v

# Backend tests
pixi run api-test

Frontend Tests

pixi run fe-test
# or
cd frontend && pnpm test:unit

Database Update

mkdir -p external_dbs
git clone --filter=blob:none --no-checkout \
  https://github.com/liaochenlanruo/BtToxin_Digger.git tmp_bttoxin_repo
cd tmp_bttoxin_repo
git sparse-checkout init --cone
git sparse-checkout set BTTCMP_db/bt_toxin
git checkout master
cd ..
cp -a tmp_bttoxin_repo/BTTCMP_db/bt_toxin external_dbs/bt_toxin
rm -rf tmp_bttoxin_repo

Subproject Guides

Troubleshooting

Common Issues

Issue Solution
pixi not found export PATH="$HOME/.pixi/bin:$PATH"
Environment not found pixi install
BtToxin_Digger unavailable pixi run -e digger BtToxin_Digger --help
Permission denied Ensure write access to /data/jobs
Task not found Check task_id in URL and response
Results expired Results auto-delete after 30 days
Nginx 404 on API Check proxy_pass http://127.0.0.1:8000/api/ (note trailing /api/)
KeyError: 'Strain' Empty DataFrame after filters - shotter now handles this gracefully
Pandas engine error Use engine="python" in pd.read_csv() for pandas 2.x

Debugging Pipeline Issues

# Check if task was created
curl http://localhost:80/api/tasks/{task_id}

# View task logs
cat jobs/{task_id}/output/logs/digger_execution.log

# Check All_Toxins.txt format
head -1 jobs/{task_id}/output/digger/Results/Toxins/All_Toxins.txt

# Test shotter independently
pixi run -e pipeline python scripts/bttoxin_shoter.py \
  --all_toxins jobs/{task_id}/output/digger/Results/Toxins/All_Toxins.txt \
  --output_dir /tmp/test_output

Docker-Specific Issues

# Check container health
docker ps
docker logs bttoxin-pipeline

# Check nginx config
docker exec bttoxin-pipeline nginx -T

# Verify backend is running
docker exec bttoxin-pipeline curl http://127.0.0.1:8000/api/health

Post-Mortem: Startup Failures & 404/403 Errors (2026-01 Update)

Symptoms:

  • Website returns 404 Not Found or 403 Forbidden.
  • Container stuck in Restarting loop.
  • Logs show exec: "uvicorn": executable file not found.

Root Causes & Solutions:

  1. Missing Environment Config:

    • Cause: pixi.toml and pixi.lock were missing in the final Docker image phase.
    • Fix: Ensure COPY --from=builder /app/pixi.toml ... is present in Dockerfile.
  2. Port Conflict:

    • Cause: docker-compose.yml mapped 80:80 while Traefik already occupied port 80.
    • Fix: Remove ports mapping in compose file; rely on Docker internal network (frontend) and Traefik labels.
  3. Frontend Permissions:

    • Cause: Built frontend files owned by root were not readable by Nginx user.
    • Fix: Add RUN chmod -R 755 /var/www/html in Dockerfile.
  4. Health Check Path:

    • Cause: Nginx routed /health to /api/health but backend expected /health.
    • Fix: Update Nginx config to proxy pass to correct endpoint.

Post-Mortem: Consistency Refactoring & Fixes (2026-01-20 Update)

Summary: Major refactoring to ensure consistency between script execution and web pipeline, fix severe container startup failures, and simplify user experience.

1. Unified Pipeline Execution

  • Problem: Web backend manually orchestrated pipeline steps, leading to discrepancies with the standalone script (e.g., missing plots, different file formats).
  • Fix: Refactored backend/app/workers/tasks.py to directly subprocess scripts/run_single_fna_pipeline.py.
  • Result: Web output is now guaranteed identical to manual script execution.

2. Result Format & Cleanup

  • Change: Switched output format from .tar.gz to .zip.
  • Feature: Added automatic cleanup of intermediate directories (digger/, shoter/) to save disk space; only the final ZIP and logs are retained.
  • Frontend: Updated download logic to handle .zip files.

3. Frontend Simplification

  • Change: Removed CRISPR Fusion UI elements (beta feature) to reduce complexity.
  • Change: Replaced complex multi-stage status indicators with a "Simulated Progress Bar" for better UX during black-box script execution.
  • Fix: Restored "One-click load" button and fixed TypeScript build errors caused by removed variables.

4. Critical Docker Fixes

  • Fix (Restart Loop): Removed incorrect image: postgres directive in docker-compose.yml that caused the web service to run database software instead of the app.
  • Fix (Env Path): Updated .dockerignore to exclude host .pixi directory, preventing "bad interpreter" errors caused by hardcoded host paths in the container.
  • Fix (404 Error): Removed erroneous rm -rf /app/frontend in Dockerfile that was accidentally deleting built frontend assets.
  • Optimization: Configured npmmirror registry to resolve build timeouts in CN network environments.