8.5 KiB
CRISPR-Cas Analysis Module - Implementation Plan
Overview
This document outlines the planned implementation of CRISPR-Cas system analysis for the BtToxin Pipeline. This feature is reserved for future development and provides a roadmap for integrating CRISPR-Cas detection with toxin activity assessment.
Status: RESERVED
All infrastructure is prepared but implementation is not yet started. This module will be activated when resources and requirements are finalized.
Architecture
Directory Structure (to be created)
crispr_cas/
├── scripts/
│ ├── detect_crispr.py # CRISPR array detection
│ ├── fusion_analysis.py # Spacer-toxin gene analysis
│ └── crispr_scoring.py # Integration with shoter scoring
├── docs/
│ ├── IMPLEMENTATION.md # This file
│ └── API_REFERENCE.md # Module API documentation (to be created)
└── tests/
├── test_detect_crispr.py
└── test_fusion_analysis.py
Data Flow
Genome (.fna) → CRISPRCasFinder → CRISPR Results (JSON)
↓
Fusion Analysis Module
↓
Toxin Genes (All_Toxins.txt)
↓
Enhanced Shoter Scoring
↓
CRISPR-Augmented Activity Scores
Implementation Plan
Phase 1: CRISPR Detection
File: crispr_cas/scripts/detect_crispr.py
Tool: CRISPRCasFinder (https://crisprcas.i2bc.paris-saclay.fr/
Tasks:
- Integrate CRISPRCasFinder CLI or implement Python wrapper
- Parse CRISPRCasFinder output (General Case or Cas spacer)
- Extract:
- Cas type/subtype (I-E, I-F, II-A, V-A, etc.)
- CRISPR array positions
- Spacer sequences
- Repeat sequences
- Protospacer Adjacent Motif (PAM) sequences
Output Format (JSON):
{
"strain_id": {
"cas_type": "I-E",
"arrays": [
{
"position": "contig_1:12345-12678",
"repeat": "5'-GTTTTAGAGCTATGCTGTTTTGAATGGTCCCAAAAC-3'",
"spacers": [
{"sequence": "ATGCGTCGAC", "position": 0},
{"sequence": "CGTAGCTAGC", "position": 37}
]
}
],
"summary": {"num_arrays": 3, "num_spacers": 24}
}
}
Phase 2: Spacer-Toxin Gene Association
File: crispr_cas/scripts/fusion_analysis.py
Tasks:
- Map CRISPR arrays to genomic positions
- Identify toxin genes near CRISPR arrays (within 10kb window)
- Analyze potential spacer-target matches:
- Extract toxin gene sequences
- Perform BLAST of spacers against toxin genes
- Identify potential immunity or targeting relationships
Output Format (JSON):
{
"strain_id": {
"crispr_toxin_associations": [
{
"crispr_array": "contig_1:12345-12678",
"nearby_toxins": ["Cry1Ac1", "Cry2Aa3"],
"spacer_targets": [
{"spacer": "ATGCGTCGAC", "target": "Cry1Ac1", "identity": 0.95}
],
"distance_to_toxin": 2500
}
]
}
}
Phase 3: Integration with Shoter Scoring
File: Modify scripts/bttoxin_shoter.py
Reserved Parameters (add to argument parser):
# CRISPR-Cas Integration (Reserved for Future Implementation)
ap.add_argument("--crispr_weight", type=float, default=0.0,
help="[FUTURE] Weight for CRISPR-Cas contribution to activity scores (0-1)")
ap.add_argument("--crispr_results", type=Path, default=None,
help="[FUTURE] Path to CRISPR-Cas detection results JSON")
ap.add_argument("--crispr_fusion", action="store_true", default=False,
help="[FUTURE] Enable spacer-toxin fusion analysis")
Scoring Integration (in score_strain() function):
# Reserved: CRISPR-Cas scoring integration
# When CRISPR is enabled, modify strain scores:
#
# if args.crispr_weight > 0 and crispr_data:
# crispr_boost = calculate_crispr_activity_boost(
# strain=strain,
# crispr_data=crispr_data.get(strain, {}),
# toxin_hits=toxin_hits
# )
# # Apply CRISPR boost to target order scores
# for order, score in sscore.scores.items():
# sscore.scores[order] = score * (1 - args.crispr_weight) + \
# crispr_boost.get(order, 0) * args.crispr_weight
Phase 4: Enhanced Visualization
File: scripts/plot_shotter.py
Tasks:
- Add CRISPR-Cas panel to existing heatmaps
- Visualize:
- CRISPR array positions on genome
- Spacer-toxin targeting relationships
- CRISPR-enhanced activity scores
Output Format:
- Extended PDF report with CRISPR section
- Additional JSON with CRISPR metadata
- Optional: Genomic track visualization (SVG/PNG)
Pixi Integration
The pixi environment is already configured (commented out) in pixi.toml:
# =========================
# CRISPR-Cas 环境:预留用于未来的 CRISPR-Cas 分析
# =========================
# [feature.crispr.dependencies]
# python = ">=3.9"
# biopython = "*"
# pandas = ">=2.0.0"
# =========================
# [feature.crispr.tasks]
# crispr-detect = "python crispr_cas/scripts/detect_crispr.py"
# crispr-fusion = "python crispr_cas/scripts/fusion_analysis.py"
To activate CRISPR module:
- Uncomment the
[feature.crispr.dependencies]section - Uncomment the
[feature.crispr.tasks]section - Add
crisprto environments list - Run
pixi install
Usage Examples (When Implemented)
Basic CRISPR Detection
pixi run -e crispr crispr-detect \
--input genome.fna \
--output crispr_results.json
Full Pipeline with CRISPR Integration
# Run CRISPR detection first
pixi run -e crispr crispr-detect --input genome.fna --output crispr.json
# Run pipeline with CRISPR-enhanced scoring
pixi run pipeline \
--input genome.fna \
--toxicity_csv Data/toxicity-data.csv \
--crispr_results crispr.json \
--crispr_weight 0.2 \
--crispr_fusion
API Integration (Future)
# Backend API endpoint (to be implemented)
POST /api/v1/tasks
{
"files": ["genome.fna"],
"crispr_enabled": true,
"crispr_weight": 0.2,
"crispr_fusion": true
}
Scientific Background
Why CRISPR-Cas in Bt Analysis?
- Self-Immunity: CRISPR-Cas systems in Bt may provide immunity against phages, affecting strain fitness
- Plasmid Tracking: CRISPR spacers can indicate plasmid content and horizontal gene transfer history
- Strain Differentiation: CRISPR array patterns can distinguish closely related strains
- Toxin Gene Proximity: CRISPR arrays near toxin genes may indicate genomic defense mechanisms
Expected Benefits
- Enhanced strain characterization beyond toxin profiling
- Better understanding of strain evolution and adaptation
- Potential correlation with biocontrol efficacy
- Additional markers for strain selection
Testing Strategy
Unit Tests
- CRISPR detection mock data parsing
- Spacer-toxin distance calculation
- CRISPR score calculation logic
Integration Tests
- End-to-end pipeline with small genome
- Comparison with manual CRISPRCasFinder results
- Scoring consistency with/without CRISPR
Validation
- Compare CRISPR-enhanced scores with experimental bioassay data
- Validate CRISPR-toxin associations using known literature
Dependencies
External Tools
- CRISPRCasFinder (v4.2+): https://crisprcas.i2bc.paris-saclay.fr/
- BLAST+ (for spacer-toxin alignment)
Python Packages
- biopython >= 1.79
- pandas >= 2.0.0
- numpy >= 1.21.0
Timeline Estimate
- Phase 1: 2-3 weeks (CRISPR detection wrapper)
- Phase 2: 2-3 weeks (Fusion analysis)
- Phase 3: 1-2 weeks (Shoter integration)
- Phase 4: 2-3 weeks (Visualization)
Total: ~2-3 months for full implementation
References
- Couvin, D. et al. (2018) CRISPRCasFinder, an update of CRISPRFinder, includes a portable version, a web server and many tools to study CRISPRs. Bioinformatics, 34(20), 3579-3581.
- Chakraborty, S. et al. (2020) CRISPR-Cas systems in Bacillus thuringiensis: diversity, evolution and potential applications. Frontiers in Microbiology, 11, 591.
- BtToxin Pipeline Documentation:
docs/shotter_workflow.md
Contact
For questions or implementation guidance, refer to the main project documentation or create an issue in the project repository.
Last Updated: 2025-01-13 Status: Reserved - Implementation Pending