docs: add Ralph project structure
- PROMPT.md: Ralph development instructions with BtToxin Pipeline specifics - specs/requirements.md: Technical specifications (API, file formats, concurrency) - @AGENT.md: Build, test, and deployment commands Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
287
specs/requirements.md
Normal file
287
specs/requirements.md
Normal file
@@ -0,0 +1,287 @@
|
||||
# Technical Specifications
|
||||
|
||||
## BtToxin Pipeline - Technical Requirements
|
||||
|
||||
### 1. System Architecture
|
||||
|
||||
#### 1.1 Overview
|
||||
BtToxin Pipeline is a web-based genomic analysis platform consisting of:
|
||||
- **Frontend**: Vue 3 SPA with Element Plus components
|
||||
- **Backend**: FastAPI REST API with async task processing
|
||||
- **Task Queue**: Redis-backed queue with semaphore-based concurrency control
|
||||
- **Analysis Engine**: BtToxin_Digger and BtToxin_Shoter modules
|
||||
|
||||
#### 1.2 Component Architecture
|
||||
```
|
||||
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||
│ Vue 3 SPA │────▶│ FastAPI API │────▶│ Task Queue │
|
||||
│ (Frontend) │ │ (Backend) │ │ (Redis) │
|
||||
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||
│ │
|
||||
│ ▼
|
||||
┌───────┴────────┐ ┌─────────────────┐
|
||||
│ Pixi/Conda │ │ Task Workers │
|
||||
│ Environments │ │ (16 concurrent)│
|
||||
└────────────────┘ └─────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────┐
|
||||
│ BtToxin Tools │
|
||||
│ (Digger/Shoter)│
|
||||
└─────────────────┘
|
||||
```
|
||||
|
||||
### 2. Frontend Specifications
|
||||
|
||||
#### 2.1 Technology Stack
|
||||
| Component | Version/Requirement |
|
||||
|-----------|---------------------|
|
||||
| Vue 3 | Composition API + script setup |
|
||||
| Vite | Latest stable |
|
||||
| Element Plus | Latest compatible |
|
||||
| Pinia | Latest stable |
|
||||
| Vue Router | v4 |
|
||||
| vue-i18n | v9+ |
|
||||
| HTTP Client | fetch API (no axios) |
|
||||
|
||||
#### 2.2 Page Structure
|
||||
| Page | Route | Description |
|
||||
|------|-------|-------------|
|
||||
| Home | `/` | System introduction, quick start |
|
||||
| About | `/about` | Features, usage, limitations |
|
||||
| Submit | `/submit` | File upload, parameters, submit |
|
||||
| Status | `/status` | Task progress, results |
|
||||
| Tools | `/tools` | BtToxin_Shoter methodology |
|
||||
|
||||
#### 2.3 File Upload Component Requirements
|
||||
- Drag and drop zone
|
||||
- File type auto-detection
|
||||
- Size limit: 100MB
|
||||
- Pre-upload format validation
|
||||
- Progress indicator during upload
|
||||
|
||||
#### 2.4 Internationalization (i18n)
|
||||
- Languages: Chinese (zh), English (en)
|
||||
- Language switcher in header/nav
|
||||
- Persist selection via localStorage
|
||||
- Refresh page on language change
|
||||
|
||||
### 3. Backend Specifications
|
||||
|
||||
#### 3.1 Technology Stack
|
||||
| Component | Version/Requirement |
|
||||
|-----------|---------------------|
|
||||
| FastAPI | Latest stable |
|
||||
| Uvicorn | Latest stable |
|
||||
| Python | 3.9+ |
|
||||
| Redis | Latest stable |
|
||||
| pixi | Latest (conda alternative) |
|
||||
|
||||
#### 3.2 API Specifications
|
||||
|
||||
##### 3.2.1 Create Task
|
||||
```
|
||||
POST /api/tasks
|
||||
Content-Type: multipart/form-data
|
||||
|
||||
Request Parameters:
|
||||
| Name | Type | Required | Default | Description |
|
||||
|-------------------------|---------|----------|---------|-------------|
|
||||
| file | File | Yes | - | Uploaded file |
|
||||
| file_type | string | Yes | - | genome/protein |
|
||||
| min_identity | float | No | 0.8 | Min similarity (0-1) |
|
||||
| min_coverage | float | No | 0.6 | Min coverage (0-1) |
|
||||
| allow_unknown_families | boolean | No | false | Allow unknown families |
|
||||
| require_index_hit | boolean | No | true | Require index hit |
|
||||
| lang | string | No | zh | Report language (zh/en) |
|
||||
|
||||
Response:
|
||||
{
|
||||
"task_id": "uuid-string",
|
||||
"status": "pending",
|
||||
"created_at": "ISO-timestamp",
|
||||
"expires_at": "ISO-timestamp"
|
||||
}
|
||||
```
|
||||
|
||||
##### 3.2.2 Get Task Status
|
||||
```
|
||||
GET /api/tasks/{task_id}
|
||||
|
||||
Response:
|
||||
{
|
||||
"task_id": "uuid-string",
|
||||
"status": "queued|running|completed|failed",
|
||||
"progress": 0-100,
|
||||
"current_stage": "digger|shoter|plots|bundle",
|
||||
"submission_time": "ISO-timestamp",
|
||||
"start_time": "ISO-timestamp|null",
|
||||
"completion_time": "ISO-timestamp|null",
|
||||
"filename": "original-filename",
|
||||
"error": "error-message|null",
|
||||
"estimated_remaining_seconds": number|null,
|
||||
"queue_position": number|null
|
||||
}
|
||||
```
|
||||
|
||||
##### 3.2.3 Download Results
|
||||
```
|
||||
GET /api/tasks/{task_id}/download
|
||||
|
||||
Response: .tar.gz file (Content-Disposition: attachment)
|
||||
```
|
||||
|
||||
##### 3.2.4 Delete Task
|
||||
```
|
||||
DELETE /api/tasks/{task_id}
|
||||
|
||||
Response: 204 No Content
|
||||
```
|
||||
|
||||
#### 3.3 Task Queue Specifications
|
||||
|
||||
##### Concurrency Control
|
||||
- Maximum concurrent tasks: 16
|
||||
- Implementation: asyncio.Semaphore(16)
|
||||
- Queue overflow: Tasks wait in Redis queue
|
||||
- Queue position: Track and display position for queued tasks
|
||||
|
||||
##### Task Lifecycle
|
||||
```
|
||||
pending → queued → running → completed
|
||||
→ failed
|
||||
```
|
||||
|
||||
##### Task Status Values
|
||||
| Status | Description |
|
||||
|--------|-------------|
|
||||
| pending | Created, waiting to enter queue |
|
||||
| queued | Waiting for available slot (has queue_position) |
|
||||
| running | Currently processing (has progress, current_stage) |
|
||||
| completed | Successfully finished (has download URL) |
|
||||
| failed | Error occurred (has error message) |
|
||||
|
||||
##### Pipeline Stages
|
||||
| Stage | Description |
|
||||
|-------|-------------|
|
||||
| digger | BtToxin_Digger gene identification |
|
||||
| shoter | BtToxin_Shoter toxicity assessment |
|
||||
| plots | Heatmap generation |
|
||||
| bundle | Result packaging (.tar.gz) |
|
||||
|
||||
#### 3.4 Redis Data Structures
|
||||
|
||||
| Key Pattern | Type | Description |
|
||||
|-------------|------|-------------|
|
||||
| `task:{task_id}:status` | Hash | Task status and metadata |
|
||||
| `task:{task_id}:result` | String | Result bundle path |
|
||||
| `queue:waiting` | List | Waiting task IDs |
|
||||
| `queue:running` | Set | Currently running task IDs |
|
||||
| `queue:position:{task_id}` | String | Individual queue position |
|
||||
|
||||
### 4. File Format Support
|
||||
|
||||
| Extension | File Type | MIME Type | sequence_type |
|
||||
|-----------|-----------|-----------|---------------|
|
||||
| .fna | Genome (nucleotide) | application/fasta | nucl |
|
||||
| .fa | Genome (nucleotide) | application/fasta | nucl |
|
||||
| .fasta | Auto-detect | application/fasta | auto |
|
||||
| .faa | Protein | application/fasta | prot |
|
||||
|
||||
### 5. Database Specifications
|
||||
|
||||
#### 5.1 BPPRC Specificity Database
|
||||
- File: `toxicity-data.csv`
|
||||
- Contains: Historical toxin-insect activity records
|
||||
- Used by: BtToxin_Shoter for activity prediction
|
||||
|
||||
#### 5.2 BtToxin Database
|
||||
- Directory: `external_dbs/bt_toxin`
|
||||
- Contains: Known Bt toxin sequences
|
||||
- Used by: BtToxin_Digger for gene identification
|
||||
|
||||
### 6. Analysis Pipeline Specifications
|
||||
|
||||
#### 6.1 BtToxin_Digger
|
||||
- Environment: digger (pixi)
|
||||
- Dependencies: BtToxin_Digger, BLAST, Perl
|
||||
- Input: Genome (.fna/.fa/.fasta) or protein (.faa) file
|
||||
- Output: Identified toxin genes with coordinates
|
||||
|
||||
#### 6.2 BtToxin_Shoter
|
||||
- Environment: pipeline (pixi)
|
||||
- Dependencies: Python 3.9+, pandas, matplotlib, seaborn
|
||||
- Input: Digger output, scoring parameters
|
||||
- Output: Toxin-target activity predictions
|
||||
|
||||
#### 6.3 Scoring Parameters
|
||||
| Parameter | Type | Range | Default | Description |
|
||||
|-----------|------|-------|---------|-------------|
|
||||
| min_identity | float | 0-1 | 0.8 | Minimum sequence identity |
|
||||
| min_coverage | float | 0-1 | 0.6 | Minimum coverage |
|
||||
| allow_unknown_families | boolean | - | false | Allow unknown toxin families |
|
||||
| require_index_hit | boolean | - | true | Require database index hit |
|
||||
|
||||
### 7. Reserved Features
|
||||
|
||||
#### 7.1 CRISPR-Cas Analysis Module
|
||||
- Directory: `crispr_cas/`
|
||||
- Environment: Additional pixi environment
|
||||
- Integration: Weighted scoring with Shotter
|
||||
- Modes: Additive or subtractive weight adjustment
|
||||
|
||||
#### 7.2 Direct Protein Analysis
|
||||
- Digger mode: sequence_type=prot
|
||||
- Shoter: Process protein sequence hits normally
|
||||
|
||||
### 8. Performance Requirements
|
||||
|
||||
| Metric | Requirement |
|
||||
|--------|-------------|
|
||||
| Task timeout | 6 hours |
|
||||
| API response time | < 1 second (excluding task execution) |
|
||||
| Max concurrent tasks | 16 |
|
||||
| Max file size | 100MB |
|
||||
| Result retention | 30 days |
|
||||
|
||||
### 9. Security Requirements
|
||||
|
||||
- **Task isolation**: Each task has independent working directory
|
||||
- **Input validation**: File format and size validation
|
||||
- **Result protection**: 30-day automatic cleanup
|
||||
- **File permissions**: Restricted access to task directories
|
||||
|
||||
### 10. Deployment Specifications
|
||||
|
||||
#### 10.1 Docker Configuration
|
||||
- Docker Compose for orchestration
|
||||
- Services: frontend, backend, redis, traefik
|
||||
- Volume mounts for data persistence
|
||||
|
||||
#### 10.2 Traefik Configuration
|
||||
- Domain: bttiaw.hzau.edu.cn
|
||||
- HTTP/HTTPS support
|
||||
- Automatic certificate management (Let's Encrypt)
|
||||
- Router rules for each service
|
||||
|
||||
### 11. Environment Variables
|
||||
|
||||
| Variable | Description | Required |
|
||||
|----------|-------------|----------|
|
||||
| REDIS_HOST | Redis server hostname | Yes |
|
||||
| REDIS_PORT | Redis server port | Yes |
|
||||
| PIXI_ENV_PATH | Path to pixi environments | Yes |
|
||||
| API_BASE_URL | Backend API base URL | Yes |
|
||||
| MAX_CONCURRENT_TASKS | Maximum concurrent tasks | No (default: 16) |
|
||||
| TASK_TIMEOUT_HOURS | Task timeout in hours | No (default: 6) |
|
||||
| RESULT_RETENTION_DAYS | Result retention days | No (default: 30) |
|
||||
|
||||
### 12. Success Criteria Validation
|
||||
|
||||
| Criterion | Validation Method |
|
||||
|-----------|-------------------|
|
||||
| Genome/protein upload | Test with .fna and .faa files |
|
||||
| 16 concurrent tasks | Load test with 20 simultaneous requests |
|
||||
| Language switching | Verify zh/en toggle works on all pages |
|
||||
| Heatmap visualization | Compare output with expected results |
|
||||
| Docker deployment | Access via bttiaw.hzau.edu.cn |
|
||||
Reference in New Issue
Block a user