From 75c7db8684c5ce04847e1a89996011271f3d4b51 Mon Sep 17 00:00:00 2001 From: zly <644706215@qq.com> Date: Tue, 13 Jan 2026 17:26:23 +0800 Subject: [PATCH] docs: add Ralph project structure - PROMPT.md: Ralph development instructions with BtToxin Pipeline specifics - specs/requirements.md: Technical specifications (API, file formats, concurrency) - @AGENT.md: Build, test, and deployment commands Co-Authored-By: Claude --- @AGENT.md | 199 +++++++++++++++++++++++++++++ PROMPT.md | 102 +++++++++++++++ specs/requirements.md | 287 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 588 insertions(+) create mode 100644 @AGENT.md create mode 100644 PROMPT.md create mode 100644 specs/requirements.md diff --git a/@AGENT.md b/@AGENT.md new file mode 100644 index 0000000..51edcbf --- /dev/null +++ b/@AGENT.md @@ -0,0 +1,199 @@ +# Agent Build Instructions + +## Project Setup + +### Frontend (Vue 3 + Vite) +```bash +cd frontend +pnpm install +``` + +### Backend (FastAPI) +```bash +# Using pixi (recommended) +cd web/zly +pixi run -e webbackend api-dev + +# Or using python directly +cd web/backend +pip install -r requirements.txt +uvicorn main:app --reload --host 0.0.0.0 --port 8000 +``` + +### Pixi Environment Setup +```bash +cd web/zly +pixi install +pixi run -e digger --help +pixi run -e pipeline --help +``` + +## Running Tests + +### Frontend Tests +```bash +cd frontend +pnpm test:unit +pnpm test:unit --run # Run once without watch +``` + +### Backend Tests +```bash +cd web/backend +pytest -v +pytest --cov=src tests/ # With coverage +``` + +## Build Commands + +### Frontend Production Build +```bash +cd frontend +pnpm build +# Output: dist/ +``` + +### Backend Production +```bash +# Build with pixi +cd web/zly +pixi run -e webbackend api-prod +``` + +## Development Server + +### Frontend Dev Server +```bash +cd frontend +pnpm dev --host # Access at http://localhost:5173 +``` + +### Backend Dev Server +```bash +cd web/zly +pixi run -e webbackend api-dev +# Access at http://localhost:8000 +``` + +## Docker Deployment +```bash +cd web/zly/docker +docker-compose up -d --build +``` + +## Key Learnings +- Update this section when you learn new build optimizations +- Document any gotchas or special setup requirements +- Keep track of the fastest test/build cycle + +## Feature Development Quality Standards + +**CRITICAL**: All new features MUST meet the following mandatory requirements before being considered complete. + +### Testing Requirements + +- **Minimum Coverage**: 85% code coverage ratio required for all new code +- **Test Pass Rate**: 100% - all tests must pass, no exceptions +- **Test Types Required**: + - Unit tests for all business logic and services + - Integration tests for API endpoints or main functionality + - End-to-end tests for critical user workflows +- **Coverage Validation**: Run coverage reports before marking features complete: + ```bash + # Examples by language/framework + npm run test:coverage + pytest --cov=src tests/ --cov-report=term-missing + cargo tarpaulin --out Html + ``` +- **Test Quality**: Tests must validate behavior, not just achieve coverage metrics +- **Test Documentation**: Complex test scenarios must include comments explaining the test strategy + +### Git Workflow Requirements + +Before moving to the next feature, ALL changes must be: + +1. **Committed with Clear Messages**: + ```bash + git add . + git commit -m "feat(module): descriptive message following conventional commits" + ``` + - Use conventional commit format: `feat:`, `fix:`, `docs:`, `test:`, `refactor:`, etc. + - Include scope when applicable: `feat(api):`, `fix(ui):`, `test(auth):` + - Write descriptive messages that explain WHAT changed and WHY + +2. **Pushed to Remote Repository**: + ```bash + git push origin + ``` + - Never leave completed features uncommitted + - Push regularly to maintain backup and enable collaboration + - Ensure CI/CD pipelines pass before considering feature complete + +3. **Branch Hygiene**: + - Work on feature branches, never directly on `main` + - Branch naming convention: `feature/`, `fix/`, `docs/` + - Create pull requests for all significant changes + +4. **Ralph Integration**: + - Update @fix_plan.md with new tasks before starting work + - Mark items complete in @fix_plan.md upon completion + - Update PROMPT.md if development patterns change + - Test features work within Ralph's autonomous loop + +### Documentation Requirements + +**ALL implementation documentation MUST remain synchronized with the codebase**: + +1. **Code Documentation**: + - Language-appropriate documentation (JSDoc, docstrings, etc.) + - Update inline comments when implementation changes + - Remove outdated comments immediately + +2. **Implementation Documentation**: + - Update relevant sections in this AGENT.md file + - Keep build and test commands current + - Update configuration examples when defaults change + - Document breaking changes prominently + +3. **README Updates**: + - Keep feature lists current + - Update setup instructions when dependencies change + - Maintain accurate command examples + - Update version compatibility information + +4. **AGENT.md Maintenance**: + - Add new build patterns to relevant sections + - Update "Key Learnings" with new insights + - Keep command examples accurate and tested + - Document new testing patterns or quality gates + +### Feature Completion Checklist + +Before marking ANY feature as complete, verify: + +- [ ] All tests pass with appropriate framework command +- [ ] Code coverage meets 85% minimum threshold +- [ ] Coverage report reviewed for meaningful test quality +- [ ] Code formatted according to project standards +- [ ] Type checking passes (if applicable) +- [ ] All changes committed with conventional commit messages +- [ ] All commits pushed to remote repository +- [ ] @fix_plan.md task marked as complete +- [ ] Implementation documentation updated +- [ ] Inline code comments updated or added +- [ ] AGENT.md updated (if new patterns introduced) +- [ ] Breaking changes documented +- [ ] Features tested within Ralph loop (if applicable) +- [ ] CI/CD pipeline passes + +### Rationale + +These standards ensure: +- **Quality**: High test coverage and pass rates prevent regressions +- **Traceability**: Git commits and @fix_plan.md provide clear history of changes +- **Maintainability**: Current documentation reduces onboarding time and prevents knowledge loss +- **Collaboration**: Pushed changes enable team visibility and code review +- **Reliability**: Consistent quality gates maintain production stability +- **Automation**: Ralph integration ensures continuous development practices + +**Enforcement**: AI agents should automatically apply these standards to all feature development tasks without requiring explicit instruction for each task. diff --git a/PROMPT.md b/PROMPT.md new file mode 100644 index 0000000..4558baa --- /dev/null +++ b/PROMPT.md @@ -0,0 +1,102 @@ +# Ralph Development Instructions + +## Context +You are Ralph, an autonomous AI development agent working on a **BtToxin Pipeline** project - an automated analysis platform for identifying and evaluating insecticidal toxin genes from Bacillus thuringiensis genomes. + +## Current Objectives + +1. **Core Analysis Pipeline**: Implement genome/protein file upload and toxin gene identification using BtToxin_Digger +2. **Toxicity Assessment**: Integrate BtToxin_Shoter module for toxin-insect target activity prediction based on BPPRC database +3. **Task Management System**: Build async task queue with 16 concurrent limit, Redis-backed status tracking, and 30-day result retention +4. **Web Interface**: Create Vue 3 frontend with Element Plus for file upload, task monitoring, and result visualization +5. **Internationalization**: Implement bilingual support (Chinese/English) with vue-i18n +6. **Docker Deployment**: Configure Docker Compose with Traefik reverse proxy for production deployment + +## Key Principles +- ONE task per loop - focus on the most important thing +- Search the codebase before assuming something isn't implemented +- Use subagents for expensive operations (file searching, analysis) +- Write comprehensive tests with clear documentation +- Update @fix_plan.md with your learnings +- Commit working changes with descriptive messages + +## Testing Guidelines (CRITICAL) +- LIMIT testing to ~20% of your total effort per loop +- PRIORITIZE: Implementation > Documentation > Tests +- Only write tests for NEW functionality you implement +- Do NOT refactor existing tests unless broken +- Focus on CORE functionality first, comprehensive testing later + +## Project Requirements + +### File Upload Requirements +- Accept genome files (.fna, .fa, .fasta) and protein files (.faa) +- Single file per task - genome and protein cannot be mixed +- Maximum file size: 100MB +- Drag-and-drop upload support with format validation + +### Analysis Pipeline Stages +1. **Digger**: Identify Bt toxin genes using BtToxin_Digger + BLAST + Perl +2. **Shoter**: Evaluate toxin activity against insect targets using BPPRC database +3. **Plots**: Generate heatmaps for toxin-target relationships +4. **Bundle**: Package results into .tar.gz download + +### Task States +- `pending`: Waiting to enter queue +- `queued`: Waiting for available slot (shows queue position) +- `running`: Currently executing (shows progress % and stage) +- `completed`: Finished successfully +- `failed`: Error occurred (shows error message) + +### API Endpoints +| Method | Endpoint | Description | +|--------|----------|-------------| +| POST | `/api/tasks` | Create new analysis task | +| GET | `/api/tasks/{task_id}` | Get task status and progress | +| GET | `/api/tasks/{task_id}/download` | Download result bundle | +| DELETE | `/api/tasks/{task_id}` | Delete task and results | + +## Technical Constraints + +### Frontend Stack +- Vue 3 (Composition API + script setup) +- Vite build tool +- Element Plus UI components +- Pinia state management +- Vue Router 4 +- vue-i18n for i18n +- fetch API for HTTP requests + +### Backend Stack +- FastAPI + Uvicorn +- asyncio + Semaphore for 16 concurrent task limit +- Redis for task status and queue management +- pixi for environment management (conda alternative) + - digger env: BtToxin_Digger + BLAST + Perl + - pipeline env: Python 3.9+ (pandas, matplotlib, seaborn) + +### Database Files +- BPPRC Specificity Database: `toxicity-data.csv` +- BtToxin database: `external_dbs/bt_toxin` + +### Scoring Parameters (configurable) +- `min_identity`: Minimum similarity (0-1, default: 0.8) +- `min_coverage`: Minimum coverage (0-1, default: 0.6) +- `allow_unknown_families`: Allow unknown families (default: false) +- `require_index_hit`: Require index hit (default: true) + +### Reserved / Future Features +- CRISPR-Cas analysis module (prepare `crispr_cas/` directory) +- Direct protein sequence analysis (sequence_type=prot) + +## Success Criteria + +1. [ ] Users can upload genome (.fna/.fa/.fasta) or protein (.faa) files for analysis +2. [ ] System supports 16 concurrent tasks with automatic queue management +3. [ ] Chinese/English language switching works correctly +4. [ ] Toxin-target activity assessment displays in heatmap format +5. [ ] Results available for download as .tar.gz within 30 days +6. [ ] Docker deployment successful with Traefik reverse proxy at bttiaw.hzau.edu.cn + +## Current Task +Follow @fix_plan.md and choose the most important item to implement next. diff --git a/specs/requirements.md b/specs/requirements.md new file mode 100644 index 0000000..4fa75b0 --- /dev/null +++ b/specs/requirements.md @@ -0,0 +1,287 @@ +# Technical Specifications + +## BtToxin Pipeline - Technical Requirements + +### 1. System Architecture + +#### 1.1 Overview +BtToxin Pipeline is a web-based genomic analysis platform consisting of: +- **Frontend**: Vue 3 SPA with Element Plus components +- **Backend**: FastAPI REST API with async task processing +- **Task Queue**: Redis-backed queue with semaphore-based concurrency control +- **Analysis Engine**: BtToxin_Digger and BtToxin_Shoter modules + +#### 1.2 Component Architecture +``` +┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ +│ Vue 3 SPA │────▶│ FastAPI API │────▶│ Task Queue │ +│ (Frontend) │ │ (Backend) │ │ (Redis) │ +└─────────────────┘ └─────────────────┘ └─────────────────┘ + │ │ + │ ▼ + ┌───────┴────────┐ ┌─────────────────┐ + │ Pixi/Conda │ │ Task Workers │ + │ Environments │ │ (16 concurrent)│ + └────────────────┘ └─────────────────┘ + │ + ▼ + ┌─────────────────┐ + │ BtToxin Tools │ + │ (Digger/Shoter)│ + └─────────────────┘ +``` + +### 2. Frontend Specifications + +#### 2.1 Technology Stack +| Component | Version/Requirement | +|-----------|---------------------| +| Vue 3 | Composition API + script setup | +| Vite | Latest stable | +| Element Plus | Latest compatible | +| Pinia | Latest stable | +| Vue Router | v4 | +| vue-i18n | v9+ | +| HTTP Client | fetch API (no axios) | + +#### 2.2 Page Structure +| Page | Route | Description | +|------|-------|-------------| +| Home | `/` | System introduction, quick start | +| About | `/about` | Features, usage, limitations | +| Submit | `/submit` | File upload, parameters, submit | +| Status | `/status` | Task progress, results | +| Tools | `/tools` | BtToxin_Shoter methodology | + +#### 2.3 File Upload Component Requirements +- Drag and drop zone +- File type auto-detection +- Size limit: 100MB +- Pre-upload format validation +- Progress indicator during upload + +#### 2.4 Internationalization (i18n) +- Languages: Chinese (zh), English (en) +- Language switcher in header/nav +- Persist selection via localStorage +- Refresh page on language change + +### 3. Backend Specifications + +#### 3.1 Technology Stack +| Component | Version/Requirement | +|-----------|---------------------| +| FastAPI | Latest stable | +| Uvicorn | Latest stable | +| Python | 3.9+ | +| Redis | Latest stable | +| pixi | Latest (conda alternative) | + +#### 3.2 API Specifications + +##### 3.2.1 Create Task +``` +POST /api/tasks +Content-Type: multipart/form-data + +Request Parameters: +| Name | Type | Required | Default | Description | +|-------------------------|---------|----------|---------|-------------| +| file | File | Yes | - | Uploaded file | +| file_type | string | Yes | - | genome/protein | +| min_identity | float | No | 0.8 | Min similarity (0-1) | +| min_coverage | float | No | 0.6 | Min coverage (0-1) | +| allow_unknown_families | boolean | No | false | Allow unknown families | +| require_index_hit | boolean | No | true | Require index hit | +| lang | string | No | zh | Report language (zh/en) | + +Response: +{ + "task_id": "uuid-string", + "status": "pending", + "created_at": "ISO-timestamp", + "expires_at": "ISO-timestamp" +} +``` + +##### 3.2.2 Get Task Status +``` +GET /api/tasks/{task_id} + +Response: +{ + "task_id": "uuid-string", + "status": "queued|running|completed|failed", + "progress": 0-100, + "current_stage": "digger|shoter|plots|bundle", + "submission_time": "ISO-timestamp", + "start_time": "ISO-timestamp|null", + "completion_time": "ISO-timestamp|null", + "filename": "original-filename", + "error": "error-message|null", + "estimated_remaining_seconds": number|null, + "queue_position": number|null +} +``` + +##### 3.2.3 Download Results +``` +GET /api/tasks/{task_id}/download + +Response: .tar.gz file (Content-Disposition: attachment) +``` + +##### 3.2.4 Delete Task +``` +DELETE /api/tasks/{task_id} + +Response: 204 No Content +``` + +#### 3.3 Task Queue Specifications + +##### Concurrency Control +- Maximum concurrent tasks: 16 +- Implementation: asyncio.Semaphore(16) +- Queue overflow: Tasks wait in Redis queue +- Queue position: Track and display position for queued tasks + +##### Task Lifecycle +``` +pending → queued → running → completed + → failed +``` + +##### Task Status Values +| Status | Description | +|--------|-------------| +| pending | Created, waiting to enter queue | +| queued | Waiting for available slot (has queue_position) | +| running | Currently processing (has progress, current_stage) | +| completed | Successfully finished (has download URL) | +| failed | Error occurred (has error message) | + +##### Pipeline Stages +| Stage | Description | +|-------|-------------| +| digger | BtToxin_Digger gene identification | +| shoter | BtToxin_Shoter toxicity assessment | +| plots | Heatmap generation | +| bundle | Result packaging (.tar.gz) | + +#### 3.4 Redis Data Structures + +| Key Pattern | Type | Description | +|-------------|------|-------------| +| `task:{task_id}:status` | Hash | Task status and metadata | +| `task:{task_id}:result` | String | Result bundle path | +| `queue:waiting` | List | Waiting task IDs | +| `queue:running` | Set | Currently running task IDs | +| `queue:position:{task_id}` | String | Individual queue position | + +### 4. File Format Support + +| Extension | File Type | MIME Type | sequence_type | +|-----------|-----------|-----------|---------------| +| .fna | Genome (nucleotide) | application/fasta | nucl | +| .fa | Genome (nucleotide) | application/fasta | nucl | +| .fasta | Auto-detect | application/fasta | auto | +| .faa | Protein | application/fasta | prot | + +### 5. Database Specifications + +#### 5.1 BPPRC Specificity Database +- File: `toxicity-data.csv` +- Contains: Historical toxin-insect activity records +- Used by: BtToxin_Shoter for activity prediction + +#### 5.2 BtToxin Database +- Directory: `external_dbs/bt_toxin` +- Contains: Known Bt toxin sequences +- Used by: BtToxin_Digger for gene identification + +### 6. Analysis Pipeline Specifications + +#### 6.1 BtToxin_Digger +- Environment: digger (pixi) +- Dependencies: BtToxin_Digger, BLAST, Perl +- Input: Genome (.fna/.fa/.fasta) or protein (.faa) file +- Output: Identified toxin genes with coordinates + +#### 6.2 BtToxin_Shoter +- Environment: pipeline (pixi) +- Dependencies: Python 3.9+, pandas, matplotlib, seaborn +- Input: Digger output, scoring parameters +- Output: Toxin-target activity predictions + +#### 6.3 Scoring Parameters +| Parameter | Type | Range | Default | Description | +|-----------|------|-------|---------|-------------| +| min_identity | float | 0-1 | 0.8 | Minimum sequence identity | +| min_coverage | float | 0-1 | 0.6 | Minimum coverage | +| allow_unknown_families | boolean | - | false | Allow unknown toxin families | +| require_index_hit | boolean | - | true | Require database index hit | + +### 7. Reserved Features + +#### 7.1 CRISPR-Cas Analysis Module +- Directory: `crispr_cas/` +- Environment: Additional pixi environment +- Integration: Weighted scoring with Shotter +- Modes: Additive or subtractive weight adjustment + +#### 7.2 Direct Protein Analysis +- Digger mode: sequence_type=prot +- Shoter: Process protein sequence hits normally + +### 8. Performance Requirements + +| Metric | Requirement | +|--------|-------------| +| Task timeout | 6 hours | +| API response time | < 1 second (excluding task execution) | +| Max concurrent tasks | 16 | +| Max file size | 100MB | +| Result retention | 30 days | + +### 9. Security Requirements + +- **Task isolation**: Each task has independent working directory +- **Input validation**: File format and size validation +- **Result protection**: 30-day automatic cleanup +- **File permissions**: Restricted access to task directories + +### 10. Deployment Specifications + +#### 10.1 Docker Configuration +- Docker Compose for orchestration +- Services: frontend, backend, redis, traefik +- Volume mounts for data persistence + +#### 10.2 Traefik Configuration +- Domain: bttiaw.hzau.edu.cn +- HTTP/HTTPS support +- Automatic certificate management (Let's Encrypt) +- Router rules for each service + +### 11. Environment Variables + +| Variable | Description | Required | +|----------|-------------|----------| +| REDIS_HOST | Redis server hostname | Yes | +| REDIS_PORT | Redis server port | Yes | +| PIXI_ENV_PATH | Path to pixi environments | Yes | +| API_BASE_URL | Backend API base URL | Yes | +| MAX_CONCURRENT_TASKS | Maximum concurrent tasks | No (default: 16) | +| TASK_TIMEOUT_HOURS | Task timeout in hours | No (default: 6) | +| RESULT_RETENTION_DAYS | Result retention days | No (default: 30) | + +### 12. Success Criteria Validation + +| Criterion | Validation Method | +|-----------|-------------------| +| Genome/protein upload | Test with .fna and .faa files | +| 16 concurrent tasks | Load test with 20 simultaneous requests | +| Language switching | Verify zh/en toggle works on all pages | +| Heatmap visualization | Compare output with expected results | +| Docker deployment | Access via bttiaw.hzau.edu.cn |