310 lines
8.5 KiB
Markdown
310 lines
8.5 KiB
Markdown
# CRISPR-Cas Analysis Module - Implementation Plan
|
|
|
|
## Overview
|
|
|
|
This document outlines the planned implementation of CRISPR-Cas system analysis for the BtToxin Pipeline. This feature is **reserved for future development** and provides a roadmap for integrating CRISPR-Cas detection with toxin activity assessment.
|
|
|
|
## Status: RESERVED
|
|
|
|
All infrastructure is prepared but implementation is **not yet started**. This module will be activated when resources and requirements are finalized.
|
|
|
|
---
|
|
|
|
## Architecture
|
|
|
|
### Directory Structure (to be created)
|
|
|
|
```
|
|
crispr_cas/
|
|
├── scripts/
|
|
│ ├── detect_crispr.py # CRISPR array detection
|
|
│ ├── fusion_analysis.py # Spacer-toxin gene analysis
|
|
│ └── crispr_scoring.py # Integration with shoter scoring
|
|
├── docs/
|
|
│ ├── IMPLEMENTATION.md # This file
|
|
│ └── API_REFERENCE.md # Module API documentation (to be created)
|
|
└── tests/
|
|
├── test_detect_crispr.py
|
|
└── test_fusion_analysis.py
|
|
```
|
|
|
|
### Data Flow
|
|
|
|
```
|
|
Genome (.fna) → CRISPRCasFinder → CRISPR Results (JSON)
|
|
↓
|
|
Fusion Analysis Module
|
|
↓
|
|
Toxin Genes (All_Toxins.txt)
|
|
↓
|
|
Enhanced Shoter Scoring
|
|
↓
|
|
CRISPR-Augmented Activity Scores
|
|
```
|
|
|
|
---
|
|
|
|
## Implementation Plan
|
|
|
|
### Phase 1: CRISPR Detection
|
|
|
|
**File:** `crispr_cas/scripts/detect_crispr.py`
|
|
|
|
**Tool:** CRISPRCasFinder (https://crisprcas.i2bc.paris-saclay.fr/
|
|
|
|
**Tasks:**
|
|
1. Integrate CRISPRCasFinder CLI or implement Python wrapper
|
|
2. Parse CRISPRCasFinder output (General Case or Cas spacer)
|
|
3. Extract:
|
|
- Cas type/subtype (I-E, I-F, II-A, V-A, etc.)
|
|
- CRISPR array positions
|
|
- Spacer sequences
|
|
- Repeat sequences
|
|
- Protospacer Adjacent Motif (PAM) sequences
|
|
|
|
**Output Format (JSON):**
|
|
```json
|
|
{
|
|
"strain_id": {
|
|
"cas_type": "I-E",
|
|
"arrays": [
|
|
{
|
|
"position": "contig_1:12345-12678",
|
|
"repeat": "5'-GTTTTAGAGCTATGCTGTTTTGAATGGTCCCAAAAC-3'",
|
|
"spacers": [
|
|
{"sequence": "ATGCGTCGAC", "position": 0},
|
|
{"sequence": "CGTAGCTAGC", "position": 37}
|
|
]
|
|
}
|
|
],
|
|
"summary": {"num_arrays": 3, "num_spacers": 24}
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### Phase 2: Spacer-Toxin Gene Association
|
|
|
|
**File:** `crispr_cas/scripts/fusion_analysis.py`
|
|
|
|
**Tasks:**
|
|
1. Map CRISPR arrays to genomic positions
|
|
2. Identify toxin genes near CRISPR arrays (within 10kb window)
|
|
3. Analyze potential spacer-target matches:
|
|
- Extract toxin gene sequences
|
|
- Perform BLAST of spacers against toxin genes
|
|
- Identify potential immunity or targeting relationships
|
|
|
|
**Output Format (JSON):**
|
|
```json
|
|
{
|
|
"strain_id": {
|
|
"crispr_toxin_associations": [
|
|
{
|
|
"crispr_array": "contig_1:12345-12678",
|
|
"nearby_toxins": ["Cry1Ac1", "Cry2Aa3"],
|
|
"spacer_targets": [
|
|
{"spacer": "ATGCGTCGAC", "target": "Cry1Ac1", "identity": 0.95}
|
|
],
|
|
"distance_to_toxin": 2500
|
|
}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### Phase 3: Integration with Shoter Scoring
|
|
|
|
**File:** Modify `scripts/bttoxin_shoter.py`
|
|
|
|
**Reserved Parameters (add to argument parser):**
|
|
|
|
```python
|
|
# CRISPR-Cas Integration (Reserved for Future Implementation)
|
|
ap.add_argument("--crispr_weight", type=float, default=0.0,
|
|
help="[FUTURE] Weight for CRISPR-Cas contribution to activity scores (0-1)")
|
|
ap.add_argument("--crispr_results", type=Path, default=None,
|
|
help="[FUTURE] Path to CRISPR-Cas detection results JSON")
|
|
ap.add_argument("--crispr_fusion", action="store_true", default=False,
|
|
help="[FUTURE] Enable spacer-toxin fusion analysis")
|
|
```
|
|
|
|
**Scoring Integration (in `score_strain()` function):**
|
|
|
|
```python
|
|
# Reserved: CRISPR-Cas scoring integration
|
|
# When CRISPR is enabled, modify strain scores:
|
|
#
|
|
# if args.crispr_weight > 0 and crispr_data:
|
|
# crispr_boost = calculate_crispr_activity_boost(
|
|
# strain=strain,
|
|
# crispr_data=crispr_data.get(strain, {}),
|
|
# toxin_hits=toxin_hits
|
|
# )
|
|
# # Apply CRISPR boost to target order scores
|
|
# for order, score in sscore.scores.items():
|
|
# sscore.scores[order] = score * (1 - args.crispr_weight) + \
|
|
# crispr_boost.get(order, 0) * args.crispr_weight
|
|
```
|
|
|
|
---
|
|
|
|
### Phase 4: Enhanced Visualization
|
|
|
|
**File:** `scripts/plot_shotter.py`
|
|
|
|
**Tasks:**
|
|
1. Add CRISPR-Cas panel to existing heatmaps
|
|
2. Visualize:
|
|
- CRISPR array positions on genome
|
|
- Spacer-toxin targeting relationships
|
|
- CRISPR-enhanced activity scores
|
|
|
|
**Output Format:**
|
|
- Extended PDF report with CRISPR section
|
|
- Additional JSON with CRISPR metadata
|
|
- Optional: Genomic track visualization (SVG/PNG)
|
|
|
|
---
|
|
|
|
## Pixi Integration
|
|
|
|
The pixi environment is already configured (commented out) in `pixi.toml`:
|
|
|
|
```toml
|
|
# =========================
|
|
# CRISPR-Cas 环境:预留用于未来的 CRISPR-Cas 分析
|
|
# =========================
|
|
# [feature.crispr.dependencies]
|
|
# python = ">=3.9"
|
|
# biopython = "*"
|
|
# pandas = ">=2.0.0"
|
|
# =========================
|
|
# [feature.crispr.tasks]
|
|
# crispr-detect = "python crispr_cas/scripts/detect_crispr.py"
|
|
# crispr-fusion = "python crispr_cas/scripts/fusion_analysis.py"
|
|
```
|
|
|
|
**To activate CRISPR module:**
|
|
1. Uncomment the `[feature.crispr.dependencies]` section
|
|
2. Uncomment the `[feature.crispr.tasks]` section
|
|
3. Add `crispr` to environments list
|
|
4. Run `pixi install`
|
|
|
|
---
|
|
|
|
## Usage Examples (When Implemented)
|
|
|
|
### Basic CRISPR Detection
|
|
```bash
|
|
pixi run -e crispr crispr-detect \
|
|
--input genome.fna \
|
|
--output crispr_results.json
|
|
```
|
|
|
|
### Full Pipeline with CRISPR Integration
|
|
```bash
|
|
# Run CRISPR detection first
|
|
pixi run -e crispr crispr-detect --input genome.fna --output crispr.json
|
|
|
|
# Run pipeline with CRISPR-enhanced scoring
|
|
pixi run pipeline \
|
|
--input genome.fna \
|
|
--toxicity_csv Data/toxicity-data.csv \
|
|
--crispr_results crispr.json \
|
|
--crispr_weight 0.2 \
|
|
--crispr_fusion
|
|
```
|
|
|
|
### API Integration (Future)
|
|
```python
|
|
# Backend API endpoint (to be implemented)
|
|
POST /api/v1/tasks
|
|
{
|
|
"files": ["genome.fna"],
|
|
"crispr_enabled": true,
|
|
"crispr_weight": 0.2,
|
|
"crispr_fusion": true
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Scientific Background
|
|
|
|
### Why CRISPR-Cas in Bt Analysis?
|
|
|
|
1. **Self-Immunity**: CRISPR-Cas systems in Bt may provide immunity against phages, affecting strain fitness
|
|
2. **Plasmid Tracking**: CRISPR spacers can indicate plasmid content and horizontal gene transfer history
|
|
3. **Strain Differentiation**: CRISPR array patterns can distinguish closely related strains
|
|
4. **Toxin Gene Proximity**: CRISPR arrays near toxin genes may indicate genomic defense mechanisms
|
|
|
|
### Expected Benefits
|
|
|
|
- Enhanced strain characterization beyond toxin profiling
|
|
- Better understanding of strain evolution and adaptation
|
|
- Potential correlation with biocontrol efficacy
|
|
- Additional markers for strain selection
|
|
|
|
---
|
|
|
|
## Testing Strategy
|
|
|
|
### Unit Tests
|
|
- CRISPR detection mock data parsing
|
|
- Spacer-toxin distance calculation
|
|
- CRISPR score calculation logic
|
|
|
|
### Integration Tests
|
|
- End-to-end pipeline with small genome
|
|
- Comparison with manual CRISPRCasFinder results
|
|
- Scoring consistency with/without CRISPR
|
|
|
|
### Validation
|
|
- Compare CRISPR-enhanced scores with experimental bioassay data
|
|
- Validate CRISPR-toxin associations using known literature
|
|
|
|
---
|
|
|
|
## Dependencies
|
|
|
|
### External Tools
|
|
- **CRISPRCasFinder** (v4.2+): https://crisprcas.i2bc.paris-saclay.fr/
|
|
- **BLAST+** (for spacer-toxin alignment)
|
|
|
|
### Python Packages
|
|
- biopython >= 1.79
|
|
- pandas >= 2.0.0
|
|
- numpy >= 1.21.0
|
|
|
|
---
|
|
|
|
## Timeline Estimate
|
|
|
|
- **Phase 1**: 2-3 weeks (CRISPR detection wrapper)
|
|
- **Phase 2**: 2-3 weeks (Fusion analysis)
|
|
- **Phase 3**: 1-2 weeks (Shoter integration)
|
|
- **Phase 4**: 2-3 weeks (Visualization)
|
|
|
|
**Total**: ~2-3 months for full implementation
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
1. Couvin, D. et al. (2018) CRISPRCasFinder, an update of CRISPRFinder, includes a portable version, a web server and many tools to study CRISPRs. *Bioinformatics*, 34(20), 3579-3581.
|
|
2. Chakraborty, S. et al. (2020) CRISPR-Cas systems in Bacillus thuringiensis: diversity, evolution and potential applications. *Frontiers in Microbiology*, 11, 591.
|
|
3. BtToxin Pipeline Documentation: `docs/shotter_workflow.md`
|
|
|
|
---
|
|
|
|
## Contact
|
|
|
|
For questions or implementation guidance, refer to the main project documentation or create an issue in the project repository.
|
|
|
|
**Last Updated:** 2025-01-13
|
|
**Status:** Reserved - Implementation Pending
|