docs: add Ralph project structure

- PROMPT.md: Ralph development instructions with BtToxin Pipeline specifics
- specs/requirements.md: Technical specifications (API, file formats, concurrency)
- @AGENT.md: Build, test, and deployment commands

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
zly
2026-01-13 17:26:23 +08:00
parent 547328ad44
commit 75c7db8684
3 changed files with 588 additions and 0 deletions

287
specs/requirements.md Normal file
View File

@@ -0,0 +1,287 @@
# Technical Specifications
## BtToxin Pipeline - Technical Requirements
### 1. System Architecture
#### 1.1 Overview
BtToxin Pipeline is a web-based genomic analysis platform consisting of:
- **Frontend**: Vue 3 SPA with Element Plus components
- **Backend**: FastAPI REST API with async task processing
- **Task Queue**: Redis-backed queue with semaphore-based concurrency control
- **Analysis Engine**: BtToxin_Digger and BtToxin_Shoter modules
#### 1.2 Component Architecture
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Vue 3 SPA │────▶│ FastAPI API │────▶│ Task Queue │
│ (Frontend) │ │ (Backend) │ │ (Redis) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │
│ ▼
┌───────┴────────┐ ┌─────────────────┐
│ Pixi/Conda │ │ Task Workers │
│ Environments │ │ (16 concurrent)│
└────────────────┘ └─────────────────┘
┌─────────────────┐
│ BtToxin Tools │
│ (Digger/Shoter)│
└─────────────────┘
```
### 2. Frontend Specifications
#### 2.1 Technology Stack
| Component | Version/Requirement |
|-----------|---------------------|
| Vue 3 | Composition API + script setup |
| Vite | Latest stable |
| Element Plus | Latest compatible |
| Pinia | Latest stable |
| Vue Router | v4 |
| vue-i18n | v9+ |
| HTTP Client | fetch API (no axios) |
#### 2.2 Page Structure
| Page | Route | Description |
|------|-------|-------------|
| Home | `/` | System introduction, quick start |
| About | `/about` | Features, usage, limitations |
| Submit | `/submit` | File upload, parameters, submit |
| Status | `/status` | Task progress, results |
| Tools | `/tools` | BtToxin_Shoter methodology |
#### 2.3 File Upload Component Requirements
- Drag and drop zone
- File type auto-detection
- Size limit: 100MB
- Pre-upload format validation
- Progress indicator during upload
#### 2.4 Internationalization (i18n)
- Languages: Chinese (zh), English (en)
- Language switcher in header/nav
- Persist selection via localStorage
- Refresh page on language change
### 3. Backend Specifications
#### 3.1 Technology Stack
| Component | Version/Requirement |
|-----------|---------------------|
| FastAPI | Latest stable |
| Uvicorn | Latest stable |
| Python | 3.9+ |
| Redis | Latest stable |
| pixi | Latest (conda alternative) |
#### 3.2 API Specifications
##### 3.2.1 Create Task
```
POST /api/tasks
Content-Type: multipart/form-data
Request Parameters:
| Name | Type | Required | Default | Description |
|-------------------------|---------|----------|---------|-------------|
| file | File | Yes | - | Uploaded file |
| file_type | string | Yes | - | genome/protein |
| min_identity | float | No | 0.8 | Min similarity (0-1) |
| min_coverage | float | No | 0.6 | Min coverage (0-1) |
| allow_unknown_families | boolean | No | false | Allow unknown families |
| require_index_hit | boolean | No | true | Require index hit |
| lang | string | No | zh | Report language (zh/en) |
Response:
{
"task_id": "uuid-string",
"status": "pending",
"created_at": "ISO-timestamp",
"expires_at": "ISO-timestamp"
}
```
##### 3.2.2 Get Task Status
```
GET /api/tasks/{task_id}
Response:
{
"task_id": "uuid-string",
"status": "queued|running|completed|failed",
"progress": 0-100,
"current_stage": "digger|shoter|plots|bundle",
"submission_time": "ISO-timestamp",
"start_time": "ISO-timestamp|null",
"completion_time": "ISO-timestamp|null",
"filename": "original-filename",
"error": "error-message|null",
"estimated_remaining_seconds": number|null,
"queue_position": number|null
}
```
##### 3.2.3 Download Results
```
GET /api/tasks/{task_id}/download
Response: .tar.gz file (Content-Disposition: attachment)
```
##### 3.2.4 Delete Task
```
DELETE /api/tasks/{task_id}
Response: 204 No Content
```
#### 3.3 Task Queue Specifications
##### Concurrency Control
- Maximum concurrent tasks: 16
- Implementation: asyncio.Semaphore(16)
- Queue overflow: Tasks wait in Redis queue
- Queue position: Track and display position for queued tasks
##### Task Lifecycle
```
pending → queued → running → completed
→ failed
```
##### Task Status Values
| Status | Description |
|--------|-------------|
| pending | Created, waiting to enter queue |
| queued | Waiting for available slot (has queue_position) |
| running | Currently processing (has progress, current_stage) |
| completed | Successfully finished (has download URL) |
| failed | Error occurred (has error message) |
##### Pipeline Stages
| Stage | Description |
|-------|-------------|
| digger | BtToxin_Digger gene identification |
| shoter | BtToxin_Shoter toxicity assessment |
| plots | Heatmap generation |
| bundle | Result packaging (.tar.gz) |
#### 3.4 Redis Data Structures
| Key Pattern | Type | Description |
|-------------|------|-------------|
| `task:{task_id}:status` | Hash | Task status and metadata |
| `task:{task_id}:result` | String | Result bundle path |
| `queue:waiting` | List | Waiting task IDs |
| `queue:running` | Set | Currently running task IDs |
| `queue:position:{task_id}` | String | Individual queue position |
### 4. File Format Support
| Extension | File Type | MIME Type | sequence_type |
|-----------|-----------|-----------|---------------|
| .fna | Genome (nucleotide) | application/fasta | nucl |
| .fa | Genome (nucleotide) | application/fasta | nucl |
| .fasta | Auto-detect | application/fasta | auto |
| .faa | Protein | application/fasta | prot |
### 5. Database Specifications
#### 5.1 BPPRC Specificity Database
- File: `toxicity-data.csv`
- Contains: Historical toxin-insect activity records
- Used by: BtToxin_Shoter for activity prediction
#### 5.2 BtToxin Database
- Directory: `external_dbs/bt_toxin`
- Contains: Known Bt toxin sequences
- Used by: BtToxin_Digger for gene identification
### 6. Analysis Pipeline Specifications
#### 6.1 BtToxin_Digger
- Environment: digger (pixi)
- Dependencies: BtToxin_Digger, BLAST, Perl
- Input: Genome (.fna/.fa/.fasta) or protein (.faa) file
- Output: Identified toxin genes with coordinates
#### 6.2 BtToxin_Shoter
- Environment: pipeline (pixi)
- Dependencies: Python 3.9+, pandas, matplotlib, seaborn
- Input: Digger output, scoring parameters
- Output: Toxin-target activity predictions
#### 6.3 Scoring Parameters
| Parameter | Type | Range | Default | Description |
|-----------|------|-------|---------|-------------|
| min_identity | float | 0-1 | 0.8 | Minimum sequence identity |
| min_coverage | float | 0-1 | 0.6 | Minimum coverage |
| allow_unknown_families | boolean | - | false | Allow unknown toxin families |
| require_index_hit | boolean | - | true | Require database index hit |
### 7. Reserved Features
#### 7.1 CRISPR-Cas Analysis Module
- Directory: `crispr_cas/`
- Environment: Additional pixi environment
- Integration: Weighted scoring with Shotter
- Modes: Additive or subtractive weight adjustment
#### 7.2 Direct Protein Analysis
- Digger mode: sequence_type=prot
- Shoter: Process protein sequence hits normally
### 8. Performance Requirements
| Metric | Requirement |
|--------|-------------|
| Task timeout | 6 hours |
| API response time | < 1 second (excluding task execution) |
| Max concurrent tasks | 16 |
| Max file size | 100MB |
| Result retention | 30 days |
### 9. Security Requirements
- **Task isolation**: Each task has independent working directory
- **Input validation**: File format and size validation
- **Result protection**: 30-day automatic cleanup
- **File permissions**: Restricted access to task directories
### 10. Deployment Specifications
#### 10.1 Docker Configuration
- Docker Compose for orchestration
- Services: frontend, backend, redis, traefik
- Volume mounts for data persistence
#### 10.2 Traefik Configuration
- Domain: bttiaw.hzau.edu.cn
- HTTP/HTTPS support
- Automatic certificate management (Let's Encrypt)
- Router rules for each service
### 11. Environment Variables
| Variable | Description | Required |
|----------|-------------|----------|
| REDIS_HOST | Redis server hostname | Yes |
| REDIS_PORT | Redis server port | Yes |
| PIXI_ENV_PATH | Path to pixi environments | Yes |
| API_BASE_URL | Backend API base URL | Yes |
| MAX_CONCURRENT_TASKS | Maximum concurrent tasks | No (default: 16) |
| TASK_TIMEOUT_HOURS | Task timeout in hours | No (default: 6) |
| RESULT_RETENTION_DAYS | Result retention days | No (default: 30) |
### 12. Success Criteria Validation
| Criterion | Validation Method |
|-----------|-------------------|
| Genome/protein upload | Test with .fna and .faa files |
| 16 concurrent tasks | Load test with 20 simultaneous requests |
| Language switching | Verify zh/en toggle works on all pages |
| Heatmap visualization | Compare output with expected results |
| Docker deployment | Access via bttiaw.hzau.edu.cn |