docs: 添加中文文档,新增前端和后端代码

- 新增 README_CN.md 中文文档
- 新增 frontend/ Vue 3 前端项目
- 新增 web/ FastAPI 后端项目

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2026-01-08 22:31:06 +08:00
parent 4267bda227
commit 4c9a7d0978
59 changed files with 9314 additions and 0 deletions

155
web/backend/AGENTS.md Normal file
View File

@@ -0,0 +1,155 @@
# Web Backend Agent Guide
## Overview
FastAPI backend for BtToxin Pipeline - provides REST API for task submission, status monitoring, and result retrieval.
## Tech Stack
- **Framework**: FastAPI
- **Validation**: Pydantic + pydantic-settings
- **Storage**: Redis (cache) + File system (persistence)
- **Testing**: pytest
- **ASGI Server**: Uvicorn
## Project Structure
```
web/backend/
├── __init__.py # Package marker
├── main.py # FastAPI application entry point
├── config.py # Configuration and system constraints
├── models.py # Data models (TaskStatus, TaskMeta, etc.)
└── storage.py # Redis + file hybrid storage
```
## Development Commands
### Via pixi (recommended)
```bash
# Start development server with hot reload
pixi run api-dev
# Run tests
pixi run api-test
```
### Direct commands
```bash
# Start server
uvicorn web.backend.main:app --reload --host 0.0.0.0 --port 8000
# Run tests
pytest web/backend/ -v
```
## API Endpoints
### Health Check
```
GET /api/health
Response: {"status": "healthy"}
```
### API Documentation
When `DEBUG=true`:
- Swagger UI: http://localhost:8000/api/docs
- ReDoc: http://localhost:8000/api/redoc
## Data Models
### TaskStatus (Enum)
```python
PENDING = "pending" # Waiting to be processed
QUEUED = "queued" # In queue, waiting for slot
RUNNING = "running" # Currently executing
COMPLETED = "completed" # Successfully finished
FAILED = "failed" # Execution failed
```
### PipelineStage (Enum)
```python
DIGGER = "digger" # Running BtToxin_Digger
SHOTER = "shoter" # Running Shoter scoring
PLOTS = "plots" # Generating heatmap plots
BUNDLE = "bundle" # Bundling results
```
### TaskMeta (Dataclass)
Task metadata stored in Redis and `task_meta.json`:
| Field | Type | Description |
|-------|------|-------------|
| task_id | str | Unique task identifier |
| status | TaskStatus | Current execution status |
| current_stage | PipelineStage | Current pipeline stage |
| progress | int | Progress percentage (0-100) |
| submission_time | str | ISO8601 timestamp |
| start_time | str | ISO8601 timestamp |
| completion_time | str | ISO8601 timestamp |
| filename | str | Original uploaded filename |
| file_size | int | File size in bytes |
| lang | str | Report language (en/zh) |
| error_message | str | Error description if failed |
| token_hash | str | SHA256 hash for auth |
## Storage Strategy
Hybrid Redis + file storage:
- **Primary**: `/data/jobs/<task_id>/task_meta.json` (persistent)
- **Cache**: Redis `task:{task_id}:meta` (fast access)
- **Write order**: File first, then Redis
- **Read order**: Redis first, file on cache miss
## Configuration
Environment variables (see `config.py`):
| Variable | Default | Description |
|----------|---------|-------------|
| REDIS_URL | redis://localhost:6379/0 | Redis connection |
| JOBS_DIR | /data/jobs | Task storage directory |
| TASK_TIMEOUT | 21600 | Task timeout (6 hours) |
| MAX_ACTIVE_PIPELINES | 4 | Max concurrent tasks |
| TASK_THREADS | 4 | Threads per task |
| CORS_ORIGINS | * | Allowed CORS origins |
| DEBUG | false | Enable debug mode |
## System Constraints
Defined in `SystemConstraints` class:
- Max upload size: 50 MB
- Min free disk: 20 GB
- Task timeout: 6 hours
- Result retention: 30 days
- Allowed extensions: .fna, .fa, .fasta
## Testing
```bash
# Run all backend tests
pixi run api-test
# Run with verbose output
pytest web/backend/ -v
# Run specific test file
pytest web/backend/test_storage.py -v
```
## CORS Configuration
CORS is configured to allow frontend access. Set `CORS_ORIGINS` environment variable for production:
```bash
CORS_ORIGINS=https://example.com,https://app.example.com
```

3
web/backend/__init__.py Normal file
View File

@@ -0,0 +1,3 @@
"""BtToxin Pipeline Web Backend Package."""
__version__ = "0.1.0"

78
web/backend/config.py Normal file
View File

@@ -0,0 +1,78 @@
"""Configuration module for BtToxin Pipeline Web Backend.
Defines System Constraints and environment variable configuration.
Requirements: 7.5
"""
import os
from dataclasses import dataclass
from typing import List
@dataclass(frozen=True)
class SystemConstraints:
"""System constraints and limits for the BtToxin Pipeline Web application."""
# Upload and storage limits
MAX_UPLOAD_SIZE: int = 50 * 1024 * 1024 # 50 MB
MIN_FREE_DISK_GB: int = 20 # Minimum free disk space in GB
MAX_TOTAL_STORAGE_GB: int = 500 # Maximum total storage in GB
# Task execution limits
TASK_TIMEOUT: int = 6 * 60 * 60 # 6 hours in seconds (21600)
TASK_THREADS: int = 4 # Default threads per task
# Concurrency and queue limits
WORKER_CONCURRENCY: int = 32 # Celery worker process concurrency
MAX_ACTIVE_PIPELINES: int = 4 # Max concurrent Digger containers
QUEUE_MAX_LENGTH: int = 1000 # Maximum queue length
# Data retention
RESULT_RETENTION_DAYS: int = 30 # Results kept for 30 days
# Allowed file extensions
ALLOWED_EXTENSIONS: tuple = (".fna", ".fa", ".fasta")
# Valid nucleotide characters (uppercase and lowercase)
VALID_NUCLEOTIDES: frozenset = frozenset("ATGCNatgcn")
class Settings:
"""Application settings loaded from environment variables."""
def __init__(self):
# Redis configuration
self.redis_url: str = os.getenv("REDIS_URL", "redis://localhost:6379/0")
# Jobs directory
self.jobs_dir: str = os.getenv("JOBS_DIR", "/data/jobs")
# Task execution settings (can override SystemConstraints defaults)
self.task_timeout: int = int(
os.getenv("TASK_TIMEOUT", str(SystemConstraints.TASK_TIMEOUT))
)
self.max_active_pipelines: int = int(
os.getenv("MAX_ACTIVE_PIPELINES", str(SystemConstraints.MAX_ACTIVE_PIPELINES))
)
self.task_threads: int = int(
os.getenv("TASK_THREADS", str(SystemConstraints.TASK_THREADS))
)
self.worker_concurrency: int = int(
os.getenv("WORKER_CONCURRENCY", str(SystemConstraints.WORKER_CONCURRENCY))
)
# CORS settings
self.cors_origins: List[str] = os.getenv(
"CORS_ORIGINS", "*"
).split(",")
# API settings
self.api_prefix: str = "/api/v1"
self.debug: bool = os.getenv("DEBUG", "false").lower() == "true"
# Global settings instance
settings = Settings()
# Global constraints instance
constraints = SystemConstraints()

33
web/backend/main.py Normal file
View File

@@ -0,0 +1,33 @@
"""FastAPI application entry point for BtToxin Pipeline Web Backend.
Requirements: 7.5
"""
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from .config import settings
# Create FastAPI application
app = FastAPI(
title="BtToxin Pipeline Web API",
description="Web API for BtToxin Pipeline - analyze genome files for insecticidal proteins",
version="0.1.0",
docs_url="/api/docs" if settings.debug else None,
redoc_url="/api/redoc" if settings.debug else None,
)
# Configure CORS
app.add_middleware(
CORSMiddleware,
allow_origins=settings.cors_origins,
allow_credentials=True,
allow_methods=["GET", "POST"],
allow_headers=["*"],
)
@app.get("/api/health")
async def health_check():
"""Health check endpoint."""
return {"status": "healthy"}

99
web/backend/models.py Normal file
View File

@@ -0,0 +1,99 @@
"""Data models for BtToxin Pipeline Web Backend.
Defines TaskStatus, PipelineStage enums and TaskMeta dataclass.
Requirements: 2.2
"""
from dataclasses import dataclass, field, asdict
from datetime import datetime
from enum import Enum
from typing import Optional
import json
class TaskStatus(str, Enum):
"""Task execution status.
Status flow: PENDING -> QUEUED -> RUNNING -> COMPLETED/FAILED
"""
PENDING = "pending" # Waiting to be processed
QUEUED = "queued" # In queue, waiting for pipeline slot
RUNNING = "running" # Currently executing
COMPLETED = "completed" # Successfully finished
FAILED = "failed" # Execution failed
class PipelineStage(str, Enum):
"""Pipeline execution stages.
Stage progression: DIGGER -> SHOTER -> PLOTS -> BUNDLE
"""
DIGGER = "digger" # Running BtToxin_Digger
SHOTER = "shoter" # Running Shoter scoring
PLOTS = "plots" # Generating heatmap plots
BUNDLE = "bundle" # Bundling results
@dataclass
class TaskMeta:
"""Task metadata stored in Redis and task_meta.json.
Contains all information about a task's state and execution.
"""
task_id: str
status: TaskStatus = TaskStatus.PENDING
current_stage: Optional[PipelineStage] = None
progress: int = 0 # 0-100
submission_time: str = "" # ISO8601 format
start_time: Optional[str] = None # ISO8601 format
completion_time: Optional[str] = None # ISO8601 format
filename: str = ""
file_size: int = 0
lang: str = "en" # en or zh
error_message: Optional[str] = None
error_stage: Optional[str] = None
token_hash: str = "" # SHA256 hash of access_token
def to_dict(self) -> dict:
"""Convert to dictionary for JSON serialization."""
data = asdict(self)
# Convert enums to their string values
if self.status:
data["status"] = self.status.value
if self.current_stage:
data["current_stage"] = self.current_stage.value
return data
def to_json(self) -> str:
"""Serialize to JSON string."""
return json.dumps(self.to_dict())
@classmethod
def from_dict(cls, data: dict) -> "TaskMeta":
"""Create TaskMeta from dictionary."""
# Convert string status to enum
if "status" in data and isinstance(data["status"], str):
data["status"] = TaskStatus(data["status"])
# Convert string stage to enum
if "current_stage" in data and data["current_stage"] is not None:
if isinstance(data["current_stage"], str):
data["current_stage"] = PipelineStage(data["current_stage"])
return cls(**data)
@classmethod
def from_json(cls, json_str: str) -> "TaskMeta":
"""Create TaskMeta from JSON string."""
return cls.from_dict(json.loads(json_str))
def get_elapsed_seconds(self) -> Optional[float]:
"""Calculate elapsed time in seconds."""
if not self.submission_time:
return None
start = datetime.fromisoformat(self.submission_time)
if self.completion_time:
end = datetime.fromisoformat(self.completion_time)
else:
end = datetime.utcnow()
return (end - start).total_seconds()

239
web/backend/storage.py Normal file
View File

@@ -0,0 +1,239 @@
"""Storage module for BtToxin Pipeline Web Backend.
Implements Redis + file hybrid storage strategy for task metadata.
- Primary storage: /data/jobs/<task_id>/task_meta.json (persistent)
- Cache: Redis task:{task_id}:meta (atomic operations, concurrent access)
- Write order: file first, then Redis
- Read order: Redis first, file on cache miss
Requirements: 2.1, 2.3
"""
import json
import os
from pathlib import Path
from typing import Optional
import redis
from .config import settings
from .models import TaskMeta, TaskStatus, PipelineStage
class TaskStorage:
"""Hybrid storage for task metadata using Redis cache + file persistence."""
TASK_META_FILENAME = "task_meta.json"
REDIS_META_PREFIX = "task:"
REDIS_META_SUFFIX = ":meta"
REDIS_TOKEN_SUFFIX = ":token_hash"
REDIS_CACHE_TTL = 86400 # 24 hours cache TTL
def __init__(self, redis_url: str = None, jobs_dir: str = None):
"""Initialize storage with Redis connection and jobs directory.
Args:
redis_url: Redis connection URL (defaults to settings.redis_url)
jobs_dir: Base directory for job files (defaults to settings.jobs_dir)
"""
self.redis_url = redis_url or settings.redis_url
self.jobs_dir = Path(jobs_dir or settings.jobs_dir)
self._redis_client: Optional[redis.Redis] = None
@property
def redis(self) -> redis.Redis:
"""Lazy Redis client initialization."""
if self._redis_client is None:
self._redis_client = redis.from_url(
self.redis_url,
decode_responses=True
)
return self._redis_client
def _get_task_dir(self, task_id: str) -> Path:
"""Get the directory path for a task."""
return self.jobs_dir / task_id
def _get_meta_file_path(self, task_id: str) -> Path:
"""Get the task_meta.json file path for a task."""
return self._get_task_dir(task_id) / self.TASK_META_FILENAME
def _get_redis_meta_key(self, task_id: str) -> str:
"""Get Redis key for task metadata."""
return f"{self.REDIS_META_PREFIX}{task_id}{self.REDIS_META_SUFFIX}"
def _get_redis_token_key(self, task_id: str) -> str:
"""Get Redis key for token hash."""
return f"{self.REDIS_META_PREFIX}{task_id}{self.REDIS_TOKEN_SUFFIX}"
# ==================== File Operations ====================
def _write_meta_file(self, task_meta: TaskMeta) -> None:
"""Write task metadata to file (persistent storage)."""
file_path = self._get_meta_file_path(task_meta.task_id)
file_path.parent.mkdir(parents=True, exist_ok=True)
with open(file_path, "w", encoding="utf-8") as f:
json.dump(task_meta.to_dict(), f, indent=2, ensure_ascii=False)
def _read_meta_file(self, task_id: str) -> Optional[TaskMeta]:
"""Read task metadata from file."""
file_path = self._get_meta_file_path(task_id)
if not file_path.exists():
return None
try:
with open(file_path, "r", encoding="utf-8") as f:
data = json.load(f)
return TaskMeta.from_dict(data)
except (json.JSONDecodeError, KeyError, TypeError):
return None
# ==================== Redis Operations ====================
def _write_redis_cache(self, task_meta: TaskMeta) -> None:
"""Write task metadata to Redis cache."""
key = self._get_redis_meta_key(task_meta.task_id)
self.redis.setex(key, self.REDIS_CACHE_TTL, task_meta.to_json())
# Also cache token hash separately for faster auth checks
if task_meta.token_hash:
token_key = self._get_redis_token_key(task_meta.task_id)
self.redis.setex(token_key, self.REDIS_CACHE_TTL, task_meta.token_hash)
def _read_redis_cache(self, task_id: str) -> Optional[TaskMeta]:
"""Read task metadata from Redis cache."""
meta_key = self._get_redis_meta_key(task_id)
data = self.redis.get(meta_key)
if data is None:
return None
try:
return TaskMeta.from_json(data)
except (json.JSONDecodeError, KeyError, TypeError):
return None
def _delete_redis_cache(self, task_id: str) -> None:
"""Delete task metadata from Redis cache."""
meta_key = self._get_redis_meta_key(task_id)
token_key = self._get_redis_token_key(task_id)
self.redis.delete(meta_key, token_key)
# ==================== Public API ====================
def create_task(self, task_meta: TaskMeta) -> None:
"""Create a new task with metadata.
Write order: file first (persistent), then Redis (cache).
"""
# Write to file first (persistent storage)
self._write_meta_file(task_meta)
# Then update Redis cache
self._write_redis_cache(task_meta)
def get_task(self, task_id: str) -> Optional[TaskMeta]:
"""Get task metadata.
Read order: Redis first (fast), file on cache miss.
"""
# Try Redis cache first
task_meta = self._read_redis_cache(task_id)
if task_meta is not None:
return task_meta
# Cache miss - read from file
task_meta = self._read_meta_file(task_id)
if task_meta is not None:
# Repopulate cache
self._write_redis_cache(task_meta)
return task_meta
def update_task(self, task_meta: TaskMeta) -> None:
"""Update task metadata.
Write order: file first (persistent), then Redis (cache).
"""
self._write_meta_file(task_meta)
self._write_redis_cache(task_meta)
def delete_task(self, task_id: str) -> bool:
"""Delete task and all associated files.
Returns True if task existed and was deleted.
"""
import shutil
task_dir = self._get_task_dir(task_id)
existed = task_dir.exists()
# Delete from Redis cache
self._delete_redis_cache(task_id)
# Delete task directory
if existed:
shutil.rmtree(task_dir, ignore_errors=True)
return existed
def update_status(
self,
task_id: str,
status: TaskStatus,
current_stage: Optional[PipelineStage] = None,
progress: Optional[int] = None,
error_message: Optional[str] = None,
error_stage: Optional[str] = None,
) -> Optional[TaskMeta]:
"""Atomically update task status fields.
Uses Redis for atomic update, then persists to file.
Returns updated TaskMeta or None if task not found.
"""
task_meta = self.get_task(task_id)
if task_meta is None:
return None
# Update fields
task_meta.status = status
if current_stage is not None:
task_meta.current_stage = current_stage
if progress is not None:
task_meta.progress = progress
if error_message is not None:
task_meta.error_message = error_message
if error_stage is not None:
task_meta.error_stage = error_stage
# Persist changes
self.update_task(task_meta)
return task_meta
def get_token_hash(self, task_id: str) -> Optional[str]:
"""Get token hash for a task (fast path via Redis).
Used for authentication checks.
"""
# Try Redis first
token_key = self._get_redis_token_key(task_id)
token_hash = self.redis.get(token_key)
if token_hash is not None:
return token_hash
# Cache miss - get from task meta
task_meta = self.get_task(task_id)
if task_meta is not None:
return task_meta.token_hash
return None
def task_exists(self, task_id: str) -> bool:
"""Check if a task exists."""
# Check Redis first
meta_key = self._get_redis_meta_key(task_id)
if self.redis.exists(meta_key):
return True
# Check file
return self._get_meta_file_path(task_id).exists()
def get_task_dir(self, task_id: str) -> Path:
"""Get the directory path for a task (public accessor)."""
return self._get_task_dir(task_id)
# Global storage instance
storage = TaskStorage()