Files
bttoxin-pipeline/specs/requirements.md
zly 75c7db8684 docs: add Ralph project structure
- PROMPT.md: Ralph development instructions with BtToxin Pipeline specifics
- specs/requirements.md: Technical specifications (API, file formats, concurrency)
- @AGENT.md: Build, test, and deployment commands

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-13 17:26:23 +08:00

9.7 KiB

Technical Specifications

BtToxin Pipeline - Technical Requirements

1. System Architecture

1.1 Overview

BtToxin Pipeline is a web-based genomic analysis platform consisting of:

  • Frontend: Vue 3 SPA with Element Plus components
  • Backend: FastAPI REST API with async task processing
  • Task Queue: Redis-backed queue with semaphore-based concurrency control
  • Analysis Engine: BtToxin_Digger and BtToxin_Shoter modules

1.2 Component Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Vue 3 SPA     │────▶│  FastAPI API    │────▶│  Task Queue     │
│  (Frontend)     │     │  (Backend)      │     │  (Redis)        │
└─────────────────┘     └─────────────────┘     └─────────────────┘
                                │                        │
                                │                        ▼
                        ┌───────┴────────┐     ┌─────────────────┐
                        │  Pixi/Conda    │     │  Task Workers   │
                        │  Environments  │     │  (16 concurrent)│
                        └────────────────┘     └─────────────────┘
                                │
                                ▼
                       ┌─────────────────┐
                       │  BtToxin Tools  │
                       │  (Digger/Shoter)│
                       └─────────────────┘

2. Frontend Specifications

2.1 Technology Stack

Component Version/Requirement
Vue 3 Composition API + script setup
Vite Latest stable
Element Plus Latest compatible
Pinia Latest stable
Vue Router v4
vue-i18n v9+
HTTP Client fetch API (no axios)

2.2 Page Structure

Page Route Description
Home / System introduction, quick start
About /about Features, usage, limitations
Submit /submit File upload, parameters, submit
Status /status Task progress, results
Tools /tools BtToxin_Shoter methodology

2.3 File Upload Component Requirements

  • Drag and drop zone
  • File type auto-detection
  • Size limit: 100MB
  • Pre-upload format validation
  • Progress indicator during upload

2.4 Internationalization (i18n)

  • Languages: Chinese (zh), English (en)
  • Language switcher in header/nav
  • Persist selection via localStorage
  • Refresh page on language change

3. Backend Specifications

3.1 Technology Stack

Component Version/Requirement
FastAPI Latest stable
Uvicorn Latest stable
Python 3.9+
Redis Latest stable
pixi Latest (conda alternative)

3.2 API Specifications

3.2.1 Create Task
POST /api/tasks
Content-Type: multipart/form-data

Request Parameters:
| Name                    | Type    | Required | Default | Description |
|-------------------------|---------|----------|---------|-------------|
| file                    | File    | Yes      | -       | Uploaded file |
| file_type               | string  | Yes      | -       | genome/protein |
| min_identity            | float   | No       | 0.8     | Min similarity (0-1) |
| min_coverage            | float   | No       | 0.6     | Min coverage (0-1) |
| allow_unknown_families  | boolean | No       | false   | Allow unknown families |
| require_index_hit       | boolean | No       | true    | Require index hit |
| lang                    | string  | No       | zh      | Report language (zh/en) |

Response:
{
  "task_id": "uuid-string",
  "status": "pending",
  "created_at": "ISO-timestamp",
  "expires_at": "ISO-timestamp"
}
3.2.2 Get Task Status
GET /api/tasks/{task_id}

Response:
{
  "task_id": "uuid-string",
  "status": "queued|running|completed|failed",
  "progress": 0-100,
  "current_stage": "digger|shoter|plots|bundle",
  "submission_time": "ISO-timestamp",
  "start_time": "ISO-timestamp|null",
  "completion_time": "ISO-timestamp|null",
  "filename": "original-filename",
  "error": "error-message|null",
  "estimated_remaining_seconds": number|null,
  "queue_position": number|null
}
3.2.3 Download Results
GET /api/tasks/{task_id}/download

Response: .tar.gz file (Content-Disposition: attachment)
3.2.4 Delete Task
DELETE /api/tasks/{task_id}

Response: 204 No Content

3.3 Task Queue Specifications

Concurrency Control
  • Maximum concurrent tasks: 16
  • Implementation: asyncio.Semaphore(16)
  • Queue overflow: Tasks wait in Redis queue
  • Queue position: Track and display position for queued tasks
Task Lifecycle
pending → queued → running → completed
                      → failed
Task Status Values
Status Description
pending Created, waiting to enter queue
queued Waiting for available slot (has queue_position)
running Currently processing (has progress, current_stage)
completed Successfully finished (has download URL)
failed Error occurred (has error message)
Pipeline Stages
Stage Description
digger BtToxin_Digger gene identification
shoter BtToxin_Shoter toxicity assessment
plots Heatmap generation
bundle Result packaging (.tar.gz)

3.4 Redis Data Structures

Key Pattern Type Description
task:{task_id}:status Hash Task status and metadata
task:{task_id}:result String Result bundle path
queue:waiting List Waiting task IDs
queue:running Set Currently running task IDs
queue:position:{task_id} String Individual queue position

4. File Format Support

Extension File Type MIME Type sequence_type
.fna Genome (nucleotide) application/fasta nucl
.fa Genome (nucleotide) application/fasta nucl
.fasta Auto-detect application/fasta auto
.faa Protein application/fasta prot

5. Database Specifications

5.1 BPPRC Specificity Database

  • File: toxicity-data.csv
  • Contains: Historical toxin-insect activity records
  • Used by: BtToxin_Shoter for activity prediction

5.2 BtToxin Database

  • Directory: external_dbs/bt_toxin
  • Contains: Known Bt toxin sequences
  • Used by: BtToxin_Digger for gene identification

6. Analysis Pipeline Specifications

6.1 BtToxin_Digger

  • Environment: digger (pixi)
  • Dependencies: BtToxin_Digger, BLAST, Perl
  • Input: Genome (.fna/.fa/.fasta) or protein (.faa) file
  • Output: Identified toxin genes with coordinates

6.2 BtToxin_Shoter

  • Environment: pipeline (pixi)
  • Dependencies: Python 3.9+, pandas, matplotlib, seaborn
  • Input: Digger output, scoring parameters
  • Output: Toxin-target activity predictions

6.3 Scoring Parameters

Parameter Type Range Default Description
min_identity float 0-1 0.8 Minimum sequence identity
min_coverage float 0-1 0.6 Minimum coverage
allow_unknown_families boolean - false Allow unknown toxin families
require_index_hit boolean - true Require database index hit

7. Reserved Features

7.1 CRISPR-Cas Analysis Module

  • Directory: crispr_cas/
  • Environment: Additional pixi environment
  • Integration: Weighted scoring with Shotter
  • Modes: Additive or subtractive weight adjustment

7.2 Direct Protein Analysis

  • Digger mode: sequence_type=prot
  • Shoter: Process protein sequence hits normally

8. Performance Requirements

Metric Requirement
Task timeout 6 hours
API response time < 1 second (excluding task execution)
Max concurrent tasks 16
Max file size 100MB
Result retention 30 days

9. Security Requirements

  • Task isolation: Each task has independent working directory
  • Input validation: File format and size validation
  • Result protection: 30-day automatic cleanup
  • File permissions: Restricted access to task directories

10. Deployment Specifications

10.1 Docker Configuration

  • Docker Compose for orchestration
  • Services: frontend, backend, redis, traefik
  • Volume mounts for data persistence

10.2 Traefik Configuration

  • Domain: bttiaw.hzau.edu.cn
  • HTTP/HTTPS support
  • Automatic certificate management (Let's Encrypt)
  • Router rules for each service

11. Environment Variables

Variable Description Required
REDIS_HOST Redis server hostname Yes
REDIS_PORT Redis server port Yes
PIXI_ENV_PATH Path to pixi environments Yes
API_BASE_URL Backend API base URL Yes
MAX_CONCURRENT_TASKS Maximum concurrent tasks No (default: 16)
TASK_TIMEOUT_HOURS Task timeout in hours No (default: 6)
RESULT_RETENTION_DAYS Result retention days No (default: 30)

12. Success Criteria Validation

Criterion Validation Method
Genome/protein upload Test with .fna and .faa files
16 concurrent tasks Load test with 20 simultaneous requests
Language switching Verify zh/en toggle works on all pages
Heatmap visualization Compare output with expected results
Docker deployment Access via bttiaw.hzau.edu.cn