- PROMPT.md: Ralph development instructions with BtToxin Pipeline specifics - specs/requirements.md: Technical specifications (API, file formats, concurrency) - @AGENT.md: Build, test, and deployment commands Co-Authored-By: Claude <noreply@anthropic.com>
288 lines
9.7 KiB
Markdown
288 lines
9.7 KiB
Markdown
# Technical Specifications
|
|
|
|
## BtToxin Pipeline - Technical Requirements
|
|
|
|
### 1. System Architecture
|
|
|
|
#### 1.1 Overview
|
|
BtToxin Pipeline is a web-based genomic analysis platform consisting of:
|
|
- **Frontend**: Vue 3 SPA with Element Plus components
|
|
- **Backend**: FastAPI REST API with async task processing
|
|
- **Task Queue**: Redis-backed queue with semaphore-based concurrency control
|
|
- **Analysis Engine**: BtToxin_Digger and BtToxin_Shoter modules
|
|
|
|
#### 1.2 Component Architecture
|
|
```
|
|
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
|
│ Vue 3 SPA │────▶│ FastAPI API │────▶│ Task Queue │
|
|
│ (Frontend) │ │ (Backend) │ │ (Redis) │
|
|
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
|
│ │
|
|
│ ▼
|
|
┌───────┴────────┐ ┌─────────────────┐
|
|
│ Pixi/Conda │ │ Task Workers │
|
|
│ Environments │ │ (16 concurrent)│
|
|
└────────────────┘ └─────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────┐
|
|
│ BtToxin Tools │
|
|
│ (Digger/Shoter)│
|
|
└─────────────────┘
|
|
```
|
|
|
|
### 2. Frontend Specifications
|
|
|
|
#### 2.1 Technology Stack
|
|
| Component | Version/Requirement |
|
|
|-----------|---------------------|
|
|
| Vue 3 | Composition API + script setup |
|
|
| Vite | Latest stable |
|
|
| Element Plus | Latest compatible |
|
|
| Pinia | Latest stable |
|
|
| Vue Router | v4 |
|
|
| vue-i18n | v9+ |
|
|
| HTTP Client | fetch API (no axios) |
|
|
|
|
#### 2.2 Page Structure
|
|
| Page | Route | Description |
|
|
|------|-------|-------------|
|
|
| Home | `/` | System introduction, quick start |
|
|
| About | `/about` | Features, usage, limitations |
|
|
| Submit | `/submit` | File upload, parameters, submit |
|
|
| Status | `/status` | Task progress, results |
|
|
| Tools | `/tools` | BtToxin_Shoter methodology |
|
|
|
|
#### 2.3 File Upload Component Requirements
|
|
- Drag and drop zone
|
|
- File type auto-detection
|
|
- Size limit: 100MB
|
|
- Pre-upload format validation
|
|
- Progress indicator during upload
|
|
|
|
#### 2.4 Internationalization (i18n)
|
|
- Languages: Chinese (zh), English (en)
|
|
- Language switcher in header/nav
|
|
- Persist selection via localStorage
|
|
- Refresh page on language change
|
|
|
|
### 3. Backend Specifications
|
|
|
|
#### 3.1 Technology Stack
|
|
| Component | Version/Requirement |
|
|
|-----------|---------------------|
|
|
| FastAPI | Latest stable |
|
|
| Uvicorn | Latest stable |
|
|
| Python | 3.9+ |
|
|
| Redis | Latest stable |
|
|
| pixi | Latest (conda alternative) |
|
|
|
|
#### 3.2 API Specifications
|
|
|
|
##### 3.2.1 Create Task
|
|
```
|
|
POST /api/tasks
|
|
Content-Type: multipart/form-data
|
|
|
|
Request Parameters:
|
|
| Name | Type | Required | Default | Description |
|
|
|-------------------------|---------|----------|---------|-------------|
|
|
| file | File | Yes | - | Uploaded file |
|
|
| file_type | string | Yes | - | genome/protein |
|
|
| min_identity | float | No | 0.8 | Min similarity (0-1) |
|
|
| min_coverage | float | No | 0.6 | Min coverage (0-1) |
|
|
| allow_unknown_families | boolean | No | false | Allow unknown families |
|
|
| require_index_hit | boolean | No | true | Require index hit |
|
|
| lang | string | No | zh | Report language (zh/en) |
|
|
|
|
Response:
|
|
{
|
|
"task_id": "uuid-string",
|
|
"status": "pending",
|
|
"created_at": "ISO-timestamp",
|
|
"expires_at": "ISO-timestamp"
|
|
}
|
|
```
|
|
|
|
##### 3.2.2 Get Task Status
|
|
```
|
|
GET /api/tasks/{task_id}
|
|
|
|
Response:
|
|
{
|
|
"task_id": "uuid-string",
|
|
"status": "queued|running|completed|failed",
|
|
"progress": 0-100,
|
|
"current_stage": "digger|shoter|plots|bundle",
|
|
"submission_time": "ISO-timestamp",
|
|
"start_time": "ISO-timestamp|null",
|
|
"completion_time": "ISO-timestamp|null",
|
|
"filename": "original-filename",
|
|
"error": "error-message|null",
|
|
"estimated_remaining_seconds": number|null,
|
|
"queue_position": number|null
|
|
}
|
|
```
|
|
|
|
##### 3.2.3 Download Results
|
|
```
|
|
GET /api/tasks/{task_id}/download
|
|
|
|
Response: .tar.gz file (Content-Disposition: attachment)
|
|
```
|
|
|
|
##### 3.2.4 Delete Task
|
|
```
|
|
DELETE /api/tasks/{task_id}
|
|
|
|
Response: 204 No Content
|
|
```
|
|
|
|
#### 3.3 Task Queue Specifications
|
|
|
|
##### Concurrency Control
|
|
- Maximum concurrent tasks: 16
|
|
- Implementation: asyncio.Semaphore(16)
|
|
- Queue overflow: Tasks wait in Redis queue
|
|
- Queue position: Track and display position for queued tasks
|
|
|
|
##### Task Lifecycle
|
|
```
|
|
pending → queued → running → completed
|
|
→ failed
|
|
```
|
|
|
|
##### Task Status Values
|
|
| Status | Description |
|
|
|--------|-------------|
|
|
| pending | Created, waiting to enter queue |
|
|
| queued | Waiting for available slot (has queue_position) |
|
|
| running | Currently processing (has progress, current_stage) |
|
|
| completed | Successfully finished (has download URL) |
|
|
| failed | Error occurred (has error message) |
|
|
|
|
##### Pipeline Stages
|
|
| Stage | Description |
|
|
|-------|-------------|
|
|
| digger | BtToxin_Digger gene identification |
|
|
| shoter | BtToxin_Shoter toxicity assessment |
|
|
| plots | Heatmap generation |
|
|
| bundle | Result packaging (.tar.gz) |
|
|
|
|
#### 3.4 Redis Data Structures
|
|
|
|
| Key Pattern | Type | Description |
|
|
|-------------|------|-------------|
|
|
| `task:{task_id}:status` | Hash | Task status and metadata |
|
|
| `task:{task_id}:result` | String | Result bundle path |
|
|
| `queue:waiting` | List | Waiting task IDs |
|
|
| `queue:running` | Set | Currently running task IDs |
|
|
| `queue:position:{task_id}` | String | Individual queue position |
|
|
|
|
### 4. File Format Support
|
|
|
|
| Extension | File Type | MIME Type | sequence_type |
|
|
|-----------|-----------|-----------|---------------|
|
|
| .fna | Genome (nucleotide) | application/fasta | nucl |
|
|
| .fa | Genome (nucleotide) | application/fasta | nucl |
|
|
| .fasta | Auto-detect | application/fasta | auto |
|
|
| .faa | Protein | application/fasta | prot |
|
|
|
|
### 5. Database Specifications
|
|
|
|
#### 5.1 BPPRC Specificity Database
|
|
- File: `toxicity-data.csv`
|
|
- Contains: Historical toxin-insect activity records
|
|
- Used by: BtToxin_Shoter for activity prediction
|
|
|
|
#### 5.2 BtToxin Database
|
|
- Directory: `external_dbs/bt_toxin`
|
|
- Contains: Known Bt toxin sequences
|
|
- Used by: BtToxin_Digger for gene identification
|
|
|
|
### 6. Analysis Pipeline Specifications
|
|
|
|
#### 6.1 BtToxin_Digger
|
|
- Environment: digger (pixi)
|
|
- Dependencies: BtToxin_Digger, BLAST, Perl
|
|
- Input: Genome (.fna/.fa/.fasta) or protein (.faa) file
|
|
- Output: Identified toxin genes with coordinates
|
|
|
|
#### 6.2 BtToxin_Shoter
|
|
- Environment: pipeline (pixi)
|
|
- Dependencies: Python 3.9+, pandas, matplotlib, seaborn
|
|
- Input: Digger output, scoring parameters
|
|
- Output: Toxin-target activity predictions
|
|
|
|
#### 6.3 Scoring Parameters
|
|
| Parameter | Type | Range | Default | Description |
|
|
|-----------|------|-------|---------|-------------|
|
|
| min_identity | float | 0-1 | 0.8 | Minimum sequence identity |
|
|
| min_coverage | float | 0-1 | 0.6 | Minimum coverage |
|
|
| allow_unknown_families | boolean | - | false | Allow unknown toxin families |
|
|
| require_index_hit | boolean | - | true | Require database index hit |
|
|
|
|
### 7. Reserved Features
|
|
|
|
#### 7.1 CRISPR-Cas Analysis Module
|
|
- Directory: `crispr_cas/`
|
|
- Environment: Additional pixi environment
|
|
- Integration: Weighted scoring with Shotter
|
|
- Modes: Additive or subtractive weight adjustment
|
|
|
|
#### 7.2 Direct Protein Analysis
|
|
- Digger mode: sequence_type=prot
|
|
- Shoter: Process protein sequence hits normally
|
|
|
|
### 8. Performance Requirements
|
|
|
|
| Metric | Requirement |
|
|
|--------|-------------|
|
|
| Task timeout | 6 hours |
|
|
| API response time | < 1 second (excluding task execution) |
|
|
| Max concurrent tasks | 16 |
|
|
| Max file size | 100MB |
|
|
| Result retention | 30 days |
|
|
|
|
### 9. Security Requirements
|
|
|
|
- **Task isolation**: Each task has independent working directory
|
|
- **Input validation**: File format and size validation
|
|
- **Result protection**: 30-day automatic cleanup
|
|
- **File permissions**: Restricted access to task directories
|
|
|
|
### 10. Deployment Specifications
|
|
|
|
#### 10.1 Docker Configuration
|
|
- Docker Compose for orchestration
|
|
- Services: frontend, backend, redis, traefik
|
|
- Volume mounts for data persistence
|
|
|
|
#### 10.2 Traefik Configuration
|
|
- Domain: bttiaw.hzau.edu.cn
|
|
- HTTP/HTTPS support
|
|
- Automatic certificate management (Let's Encrypt)
|
|
- Router rules for each service
|
|
|
|
### 11. Environment Variables
|
|
|
|
| Variable | Description | Required |
|
|
|----------|-------------|----------|
|
|
| REDIS_HOST | Redis server hostname | Yes |
|
|
| REDIS_PORT | Redis server port | Yes |
|
|
| PIXI_ENV_PATH | Path to pixi environments | Yes |
|
|
| API_BASE_URL | Backend API base URL | Yes |
|
|
| MAX_CONCURRENT_TASKS | Maximum concurrent tasks | No (default: 16) |
|
|
| TASK_TIMEOUT_HOURS | Task timeout in hours | No (default: 6) |
|
|
| RESULT_RETENTION_DAYS | Result retention days | No (default: 30) |
|
|
|
|
### 12. Success Criteria Validation
|
|
|
|
| Criterion | Validation Method |
|
|
|-----------|-------------------|
|
|
| Genome/protein upload | Test with .fna and .faa files |
|
|
| 16 concurrent tasks | Load test with 20 simultaneous requests |
|
|
| Language switching | Verify zh/en toggle works on all pages |
|
|
| Heatmap visualization | Compare output with expected results |
|
|
| Docker deployment | Access via bttiaw.hzau.edu.cn |
|