docs: add Ralph project structure

- PROMPT.md: Ralph development instructions with BtToxin Pipeline specifics
- specs/requirements.md: Technical specifications (API, file formats, concurrency)
- @AGENT.md: Build, test, and deployment commands

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
zly
2026-01-13 17:26:23 +08:00
parent 547328ad44
commit 75c7db8684
3 changed files with 588 additions and 0 deletions

199
@AGENT.md Normal file
View File

@@ -0,0 +1,199 @@
# Agent Build Instructions
## Project Setup
### Frontend (Vue 3 + Vite)
```bash
cd frontend
pnpm install
```
### Backend (FastAPI)
```bash
# Using pixi (recommended)
cd web/zly
pixi run -e webbackend api-dev
# Or using python directly
cd web/backend
pip install -r requirements.txt
uvicorn main:app --reload --host 0.0.0.0 --port 8000
```
### Pixi Environment Setup
```bash
cd web/zly
pixi install
pixi run -e digger --help
pixi run -e pipeline --help
```
## Running Tests
### Frontend Tests
```bash
cd frontend
pnpm test:unit
pnpm test:unit --run # Run once without watch
```
### Backend Tests
```bash
cd web/backend
pytest -v
pytest --cov=src tests/ # With coverage
```
## Build Commands
### Frontend Production Build
```bash
cd frontend
pnpm build
# Output: dist/
```
### Backend Production
```bash
# Build with pixi
cd web/zly
pixi run -e webbackend api-prod
```
## Development Server
### Frontend Dev Server
```bash
cd frontend
pnpm dev --host # Access at http://localhost:5173
```
### Backend Dev Server
```bash
cd web/zly
pixi run -e webbackend api-dev
# Access at http://localhost:8000
```
## Docker Deployment
```bash
cd web/zly/docker
docker-compose up -d --build
```
## Key Learnings
- Update this section when you learn new build optimizations
- Document any gotchas or special setup requirements
- Keep track of the fastest test/build cycle
## Feature Development Quality Standards
**CRITICAL**: All new features MUST meet the following mandatory requirements before being considered complete.
### Testing Requirements
- **Minimum Coverage**: 85% code coverage ratio required for all new code
- **Test Pass Rate**: 100% - all tests must pass, no exceptions
- **Test Types Required**:
- Unit tests for all business logic and services
- Integration tests for API endpoints or main functionality
- End-to-end tests for critical user workflows
- **Coverage Validation**: Run coverage reports before marking features complete:
```bash
# Examples by language/framework
npm run test:coverage
pytest --cov=src tests/ --cov-report=term-missing
cargo tarpaulin --out Html
```
- **Test Quality**: Tests must validate behavior, not just achieve coverage metrics
- **Test Documentation**: Complex test scenarios must include comments explaining the test strategy
### Git Workflow Requirements
Before moving to the next feature, ALL changes must be:
1. **Committed with Clear Messages**:
```bash
git add .
git commit -m "feat(module): descriptive message following conventional commits"
```
- Use conventional commit format: `feat:`, `fix:`, `docs:`, `test:`, `refactor:`, etc.
- Include scope when applicable: `feat(api):`, `fix(ui):`, `test(auth):`
- Write descriptive messages that explain WHAT changed and WHY
2. **Pushed to Remote Repository**:
```bash
git push origin <branch-name>
```
- Never leave completed features uncommitted
- Push regularly to maintain backup and enable collaboration
- Ensure CI/CD pipelines pass before considering feature complete
3. **Branch Hygiene**:
- Work on feature branches, never directly on `main`
- Branch naming convention: `feature/<feature-name>`, `fix/<issue-name>`, `docs/<doc-update>`
- Create pull requests for all significant changes
4. **Ralph Integration**:
- Update @fix_plan.md with new tasks before starting work
- Mark items complete in @fix_plan.md upon completion
- Update PROMPT.md if development patterns change
- Test features work within Ralph's autonomous loop
### Documentation Requirements
**ALL implementation documentation MUST remain synchronized with the codebase**:
1. **Code Documentation**:
- Language-appropriate documentation (JSDoc, docstrings, etc.)
- Update inline comments when implementation changes
- Remove outdated comments immediately
2. **Implementation Documentation**:
- Update relevant sections in this AGENT.md file
- Keep build and test commands current
- Update configuration examples when defaults change
- Document breaking changes prominently
3. **README Updates**:
- Keep feature lists current
- Update setup instructions when dependencies change
- Maintain accurate command examples
- Update version compatibility information
4. **AGENT.md Maintenance**:
- Add new build patterns to relevant sections
- Update "Key Learnings" with new insights
- Keep command examples accurate and tested
- Document new testing patterns or quality gates
### Feature Completion Checklist
Before marking ANY feature as complete, verify:
- [ ] All tests pass with appropriate framework command
- [ ] Code coverage meets 85% minimum threshold
- [ ] Coverage report reviewed for meaningful test quality
- [ ] Code formatted according to project standards
- [ ] Type checking passes (if applicable)
- [ ] All changes committed with conventional commit messages
- [ ] All commits pushed to remote repository
- [ ] @fix_plan.md task marked as complete
- [ ] Implementation documentation updated
- [ ] Inline code comments updated or added
- [ ] AGENT.md updated (if new patterns introduced)
- [ ] Breaking changes documented
- [ ] Features tested within Ralph loop (if applicable)
- [ ] CI/CD pipeline passes
### Rationale
These standards ensure:
- **Quality**: High test coverage and pass rates prevent regressions
- **Traceability**: Git commits and @fix_plan.md provide clear history of changes
- **Maintainability**: Current documentation reduces onboarding time and prevents knowledge loss
- **Collaboration**: Pushed changes enable team visibility and code review
- **Reliability**: Consistent quality gates maintain production stability
- **Automation**: Ralph integration ensures continuous development practices
**Enforcement**: AI agents should automatically apply these standards to all feature development tasks without requiring explicit instruction for each task.

102
PROMPT.md Normal file
View File

@@ -0,0 +1,102 @@
# Ralph Development Instructions
## Context
You are Ralph, an autonomous AI development agent working on a **BtToxin Pipeline** project - an automated analysis platform for identifying and evaluating insecticidal toxin genes from Bacillus thuringiensis genomes.
## Current Objectives
1. **Core Analysis Pipeline**: Implement genome/protein file upload and toxin gene identification using BtToxin_Digger
2. **Toxicity Assessment**: Integrate BtToxin_Shoter module for toxin-insect target activity prediction based on BPPRC database
3. **Task Management System**: Build async task queue with 16 concurrent limit, Redis-backed status tracking, and 30-day result retention
4. **Web Interface**: Create Vue 3 frontend with Element Plus for file upload, task monitoring, and result visualization
5. **Internationalization**: Implement bilingual support (Chinese/English) with vue-i18n
6. **Docker Deployment**: Configure Docker Compose with Traefik reverse proxy for production deployment
## Key Principles
- ONE task per loop - focus on the most important thing
- Search the codebase before assuming something isn't implemented
- Use subagents for expensive operations (file searching, analysis)
- Write comprehensive tests with clear documentation
- Update @fix_plan.md with your learnings
- Commit working changes with descriptive messages
## Testing Guidelines (CRITICAL)
- LIMIT testing to ~20% of your total effort per loop
- PRIORITIZE: Implementation > Documentation > Tests
- Only write tests for NEW functionality you implement
- Do NOT refactor existing tests unless broken
- Focus on CORE functionality first, comprehensive testing later
## Project Requirements
### File Upload Requirements
- Accept genome files (.fna, .fa, .fasta) and protein files (.faa)
- Single file per task - genome and protein cannot be mixed
- Maximum file size: 100MB
- Drag-and-drop upload support with format validation
### Analysis Pipeline Stages
1. **Digger**: Identify Bt toxin genes using BtToxin_Digger + BLAST + Perl
2. **Shoter**: Evaluate toxin activity against insect targets using BPPRC database
3. **Plots**: Generate heatmaps for toxin-target relationships
4. **Bundle**: Package results into .tar.gz download
### Task States
- `pending`: Waiting to enter queue
- `queued`: Waiting for available slot (shows queue position)
- `running`: Currently executing (shows progress % and stage)
- `completed`: Finished successfully
- `failed`: Error occurred (shows error message)
### API Endpoints
| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | `/api/tasks` | Create new analysis task |
| GET | `/api/tasks/{task_id}` | Get task status and progress |
| GET | `/api/tasks/{task_id}/download` | Download result bundle |
| DELETE | `/api/tasks/{task_id}` | Delete task and results |
## Technical Constraints
### Frontend Stack
- Vue 3 (Composition API + script setup)
- Vite build tool
- Element Plus UI components
- Pinia state management
- Vue Router 4
- vue-i18n for i18n
- fetch API for HTTP requests
### Backend Stack
- FastAPI + Uvicorn
- asyncio + Semaphore for 16 concurrent task limit
- Redis for task status and queue management
- pixi for environment management (conda alternative)
- digger env: BtToxin_Digger + BLAST + Perl
- pipeline env: Python 3.9+ (pandas, matplotlib, seaborn)
### Database Files
- BPPRC Specificity Database: `toxicity-data.csv`
- BtToxin database: `external_dbs/bt_toxin`
### Scoring Parameters (configurable)
- `min_identity`: Minimum similarity (0-1, default: 0.8)
- `min_coverage`: Minimum coverage (0-1, default: 0.6)
- `allow_unknown_families`: Allow unknown families (default: false)
- `require_index_hit`: Require index hit (default: true)
### Reserved / Future Features
- CRISPR-Cas analysis module (prepare `crispr_cas/` directory)
- Direct protein sequence analysis (sequence_type=prot)
## Success Criteria
1. [ ] Users can upload genome (.fna/.fa/.fasta) or protein (.faa) files for analysis
2. [ ] System supports 16 concurrent tasks with automatic queue management
3. [ ] Chinese/English language switching works correctly
4. [ ] Toxin-target activity assessment displays in heatmap format
5. [ ] Results available for download as .tar.gz within 30 days
6. [ ] Docker deployment successful with Traefik reverse proxy at bttiaw.hzau.edu.cn
## Current Task
Follow @fix_plan.md and choose the most important item to implement next.

287
specs/requirements.md Normal file
View File

@@ -0,0 +1,287 @@
# Technical Specifications
## BtToxin Pipeline - Technical Requirements
### 1. System Architecture
#### 1.1 Overview
BtToxin Pipeline is a web-based genomic analysis platform consisting of:
- **Frontend**: Vue 3 SPA with Element Plus components
- **Backend**: FastAPI REST API with async task processing
- **Task Queue**: Redis-backed queue with semaphore-based concurrency control
- **Analysis Engine**: BtToxin_Digger and BtToxin_Shoter modules
#### 1.2 Component Architecture
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Vue 3 SPA │────▶│ FastAPI API │────▶│ Task Queue │
│ (Frontend) │ │ (Backend) │ │ (Redis) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │
│ ▼
┌───────┴────────┐ ┌─────────────────┐
│ Pixi/Conda │ │ Task Workers │
│ Environments │ │ (16 concurrent)│
└────────────────┘ └─────────────────┘
┌─────────────────┐
│ BtToxin Tools │
│ (Digger/Shoter)│
└─────────────────┘
```
### 2. Frontend Specifications
#### 2.1 Technology Stack
| Component | Version/Requirement |
|-----------|---------------------|
| Vue 3 | Composition API + script setup |
| Vite | Latest stable |
| Element Plus | Latest compatible |
| Pinia | Latest stable |
| Vue Router | v4 |
| vue-i18n | v9+ |
| HTTP Client | fetch API (no axios) |
#### 2.2 Page Structure
| Page | Route | Description |
|------|-------|-------------|
| Home | `/` | System introduction, quick start |
| About | `/about` | Features, usage, limitations |
| Submit | `/submit` | File upload, parameters, submit |
| Status | `/status` | Task progress, results |
| Tools | `/tools` | BtToxin_Shoter methodology |
#### 2.3 File Upload Component Requirements
- Drag and drop zone
- File type auto-detection
- Size limit: 100MB
- Pre-upload format validation
- Progress indicator during upload
#### 2.4 Internationalization (i18n)
- Languages: Chinese (zh), English (en)
- Language switcher in header/nav
- Persist selection via localStorage
- Refresh page on language change
### 3. Backend Specifications
#### 3.1 Technology Stack
| Component | Version/Requirement |
|-----------|---------------------|
| FastAPI | Latest stable |
| Uvicorn | Latest stable |
| Python | 3.9+ |
| Redis | Latest stable |
| pixi | Latest (conda alternative) |
#### 3.2 API Specifications
##### 3.2.1 Create Task
```
POST /api/tasks
Content-Type: multipart/form-data
Request Parameters:
| Name | Type | Required | Default | Description |
|-------------------------|---------|----------|---------|-------------|
| file | File | Yes | - | Uploaded file |
| file_type | string | Yes | - | genome/protein |
| min_identity | float | No | 0.8 | Min similarity (0-1) |
| min_coverage | float | No | 0.6 | Min coverage (0-1) |
| allow_unknown_families | boolean | No | false | Allow unknown families |
| require_index_hit | boolean | No | true | Require index hit |
| lang | string | No | zh | Report language (zh/en) |
Response:
{
"task_id": "uuid-string",
"status": "pending",
"created_at": "ISO-timestamp",
"expires_at": "ISO-timestamp"
}
```
##### 3.2.2 Get Task Status
```
GET /api/tasks/{task_id}
Response:
{
"task_id": "uuid-string",
"status": "queued|running|completed|failed",
"progress": 0-100,
"current_stage": "digger|shoter|plots|bundle",
"submission_time": "ISO-timestamp",
"start_time": "ISO-timestamp|null",
"completion_time": "ISO-timestamp|null",
"filename": "original-filename",
"error": "error-message|null",
"estimated_remaining_seconds": number|null,
"queue_position": number|null
}
```
##### 3.2.3 Download Results
```
GET /api/tasks/{task_id}/download
Response: .tar.gz file (Content-Disposition: attachment)
```
##### 3.2.4 Delete Task
```
DELETE /api/tasks/{task_id}
Response: 204 No Content
```
#### 3.3 Task Queue Specifications
##### Concurrency Control
- Maximum concurrent tasks: 16
- Implementation: asyncio.Semaphore(16)
- Queue overflow: Tasks wait in Redis queue
- Queue position: Track and display position for queued tasks
##### Task Lifecycle
```
pending → queued → running → completed
→ failed
```
##### Task Status Values
| Status | Description |
|--------|-------------|
| pending | Created, waiting to enter queue |
| queued | Waiting for available slot (has queue_position) |
| running | Currently processing (has progress, current_stage) |
| completed | Successfully finished (has download URL) |
| failed | Error occurred (has error message) |
##### Pipeline Stages
| Stage | Description |
|-------|-------------|
| digger | BtToxin_Digger gene identification |
| shoter | BtToxin_Shoter toxicity assessment |
| plots | Heatmap generation |
| bundle | Result packaging (.tar.gz) |
#### 3.4 Redis Data Structures
| Key Pattern | Type | Description |
|-------------|------|-------------|
| `task:{task_id}:status` | Hash | Task status and metadata |
| `task:{task_id}:result` | String | Result bundle path |
| `queue:waiting` | List | Waiting task IDs |
| `queue:running` | Set | Currently running task IDs |
| `queue:position:{task_id}` | String | Individual queue position |
### 4. File Format Support
| Extension | File Type | MIME Type | sequence_type |
|-----------|-----------|-----------|---------------|
| .fna | Genome (nucleotide) | application/fasta | nucl |
| .fa | Genome (nucleotide) | application/fasta | nucl |
| .fasta | Auto-detect | application/fasta | auto |
| .faa | Protein | application/fasta | prot |
### 5. Database Specifications
#### 5.1 BPPRC Specificity Database
- File: `toxicity-data.csv`
- Contains: Historical toxin-insect activity records
- Used by: BtToxin_Shoter for activity prediction
#### 5.2 BtToxin Database
- Directory: `external_dbs/bt_toxin`
- Contains: Known Bt toxin sequences
- Used by: BtToxin_Digger for gene identification
### 6. Analysis Pipeline Specifications
#### 6.1 BtToxin_Digger
- Environment: digger (pixi)
- Dependencies: BtToxin_Digger, BLAST, Perl
- Input: Genome (.fna/.fa/.fasta) or protein (.faa) file
- Output: Identified toxin genes with coordinates
#### 6.2 BtToxin_Shoter
- Environment: pipeline (pixi)
- Dependencies: Python 3.9+, pandas, matplotlib, seaborn
- Input: Digger output, scoring parameters
- Output: Toxin-target activity predictions
#### 6.3 Scoring Parameters
| Parameter | Type | Range | Default | Description |
|-----------|------|-------|---------|-------------|
| min_identity | float | 0-1 | 0.8 | Minimum sequence identity |
| min_coverage | float | 0-1 | 0.6 | Minimum coverage |
| allow_unknown_families | boolean | - | false | Allow unknown toxin families |
| require_index_hit | boolean | - | true | Require database index hit |
### 7. Reserved Features
#### 7.1 CRISPR-Cas Analysis Module
- Directory: `crispr_cas/`
- Environment: Additional pixi environment
- Integration: Weighted scoring with Shotter
- Modes: Additive or subtractive weight adjustment
#### 7.2 Direct Protein Analysis
- Digger mode: sequence_type=prot
- Shoter: Process protein sequence hits normally
### 8. Performance Requirements
| Metric | Requirement |
|--------|-------------|
| Task timeout | 6 hours |
| API response time | < 1 second (excluding task execution) |
| Max concurrent tasks | 16 |
| Max file size | 100MB |
| Result retention | 30 days |
### 9. Security Requirements
- **Task isolation**: Each task has independent working directory
- **Input validation**: File format and size validation
- **Result protection**: 30-day automatic cleanup
- **File permissions**: Restricted access to task directories
### 10. Deployment Specifications
#### 10.1 Docker Configuration
- Docker Compose for orchestration
- Services: frontend, backend, redis, traefik
- Volume mounts for data persistence
#### 10.2 Traefik Configuration
- Domain: bttiaw.hzau.edu.cn
- HTTP/HTTPS support
- Automatic certificate management (Let's Encrypt)
- Router rules for each service
### 11. Environment Variables
| Variable | Description | Required |
|----------|-------------|----------|
| REDIS_HOST | Redis server hostname | Yes |
| REDIS_PORT | Redis server port | Yes |
| PIXI_ENV_PATH | Path to pixi environments | Yes |
| API_BASE_URL | Backend API base URL | Yes |
| MAX_CONCURRENT_TASKS | Maximum concurrent tasks | No (default: 16) |
| TASK_TIMEOUT_HOURS | Task timeout in hours | No (default: 6) |
| RESULT_RETENTION_DAYS | Result retention days | No (default: 30) |
### 12. Success Criteria Validation
| Criterion | Validation Method |
|-----------|-------------------|
| Genome/protein upload | Test with .fna and .faa files |
| 16 concurrent tasks | Load test with 20 simultaneous requests |
| Language switching | Verify zh/en toggle works on all pages |
| Heatmap visualization | Compare output with expected results |
| Docker deployment | Access via bttiaw.hzau.edu.cn |