Refactor: Unified pipeline execution, simplified UI, and fixed Docker config
- Backend: Refactored tasks.py to directly invoke run_single_fna_pipeline.py for consistency. - Backend: Changed output format to ZIP and added auto-cleanup of intermediate files. - Backend: Fixed language parameter passing in API and tasks. - Frontend: Removed CRISPR Fusion UI elements from Submit and Monitor views. - Frontend: Implemented simulated progress bar for better UX. - Frontend: Restored One-click load button and added result file structure documentation. - Docker: Fixed critical Restarting loop by removing incorrect image directive in docker-compose.yml. - Docker: Optimized Dockerfile to correct .pixi environment path issues and prevent accidental deletion of frontend assets.
This commit is contained in:
81
tools/README.md
Normal file
81
tools/README.md
Normal file
@@ -0,0 +1,81 @@
|
||||
# BtToxin Analysis Modules
|
||||
|
||||
This directory contains specialized analysis modules integrated into the BtToxin Pipeline. Each module focuses on identifying and characterizing specific genomic features that contribute to the insecticidal potential of *Bacillus thuringiensis* strains.
|
||||
|
||||
## 1. BtToxin_Digger
|
||||
**Core Toxin Identification Module**
|
||||
|
||||
This is the foundational module of the pipeline, responsible for identifying Cry, Cyt, and Vip toxin genes in bacterial genomes.
|
||||
|
||||
* **Function**:
|
||||
* Predicts Open Reading Frames (ORFs) from genomic sequences (.fna).
|
||||
* Translates coding sequences (CDS) to proteins.
|
||||
* Uses BLAST and HMM (Hidden Markov Models) to search against a curated database of known Bt toxins.
|
||||
* Identifies toxin candidates and classifies them into families/subfamilies based on sequence identity.
|
||||
* **Key Metrics**: Sequence Identity (`Identity`), Coverage (`Coverage`), and HMM domain hits.
|
||||
* **Role**: Provides the primary "evidence" ($w_i$) for the Shotter scoring system.
|
||||
|
||||
## 2. BGC Analysis (bgc_analysis)
|
||||
**Biosynthetic Gene Cluster Detection**
|
||||
|
||||
This module detects three specific classes of insecticidal protein gene clusters that serve as independent markers of insecticidal activity.
|
||||
|
||||
* **Targets**:
|
||||
* **ZWA**: Zwittermicin A biosynthetic gene cluster.
|
||||
* **Thu**: Thuringiensin (beta-exotoxin) biosynthetic gene cluster.
|
||||
* **TAA**: Toxin A (insecticidal protein) gene cluster.
|
||||
* **Methodology**:
|
||||
* Uses BLAST/HMM to detect signature enzymes and backbone genes specific to these clusters.
|
||||
* Returns a binary status (Present/Absent) for each cluster type ($b_Z, b_T, b_A \in \{0, 1\}$).
|
||||
* **Contribution to Scoring**:
|
||||
* The presence of these clusters acts as a **positive prior**, boosting the final toxicity score ($S_{\text{final}}$) because they represent functional insecticidal modules independent of Cry/Vip proteins.
|
||||
|
||||
## 3. Mobilome Analysis (mobilome_analysis)
|
||||
**Mobile Genetic Element Quantification**
|
||||
|
||||
This module quantifies the "mobilome"—the collection of mobile genetic elements—which correlates with a strain's ability to acquire, rearrange, and maintain toxin genes.
|
||||
|
||||
* **Targets**:
|
||||
* **Transposases**: Enzymes that facilitate gene movement.
|
||||
* **Plasmids**: Extrachromosomal DNA often carrying toxin genes in Bt.
|
||||
* **Phages**: Viral elements that can mediate horizontal gene transfer.
|
||||
* **Methodology**:
|
||||
* Annotates and counts these elements in the genome.
|
||||
* Returns a total count or specific counts ($m$).
|
||||
* **Contribution to Scoring**:
|
||||
* A higher mobilome count indicates a more "open" genome capable of HGT (Horizontal Gene Transfer).
|
||||
* Contributes a **positive prior** (via a saturation function $g(m)$) to the toxicity score, reflecting a higher potential for evolving or acquiring diverse toxin cocktails.
|
||||
|
||||
## 4. CRISPR-Cas Analysis (crispr_cas_analysis)
|
||||
**Genome Defense System Characterization**
|
||||
|
||||
This module characterizes the CRISPR-Cas immune systems, which act as barriers to foreign DNA (including plasmids and phages).
|
||||
|
||||
* **Targets**:
|
||||
* **Cas Proteins**: Identification of Cas gene clusters.
|
||||
* **CRISPR Arrays**: Detection of direct repeats and spacers.
|
||||
* **Methodology**:
|
||||
* Classifies the system status into three levels: **Complete** (functional), **Incomplete** (degraded), or **Absent**.
|
||||
* Returns a status code $c \in \{0, 1, 2\}$ (0=Absent, 1=Incomplete, 2=Complete).
|
||||
* **Contribution to Scoring**:
|
||||
* **Negative Prior**: A complete, functional CRISPR system ($c=2$) limits the intake of foreign plasmids (which often carry toxins).
|
||||
* Therefore, an **Absent** system allows for the highest potential of plasmid-borne toxin acquisition (Highest score boost), while a **Complete** system penalizes the prior probability (Lowest/No boost). This follows the logic: *Absent > Incomplete > Complete* for toxicity potential.
|
||||
|
||||
---
|
||||
|
||||
## Integration in Shotter Scoring
|
||||
|
||||
These modules work together to refine the final insecticidal activity prediction:
|
||||
|
||||
1. **Evidence**: **BtToxin_Digger** provides direct evidence of toxin genes ($S_{\text{tox}}$).
|
||||
2. **Priors**: **BGC**, **Mobilome**, and **CRISPR** modules provide a "genomic context" prior ($\Delta(\text{strain})$).
|
||||
|
||||
The final score combines these using a logit-based adjustment:
|
||||
|
||||
$$
|
||||
S_{\text{final}} = \sigma\left( \operatorname{logit}(S_{\text{tox}}) + \Delta(\text{strain}) \right)
|
||||
$$
|
||||
|
||||
Where $\Delta(\text{strain})$ aggregates the positive boosts from BGCs/Mobilome and the adjustment from CRISPR status.
|
||||
|
||||
For full mathematical details, see [docs/shotter_math_full_zh_typora.md](../docs/shotter_math_full_zh_typora.md).
|
||||
31
tools/bgc_analysis/detect_bgc.py
Normal file
31
tools/bgc_analysis/detect_bgc.py
Normal file
@@ -0,0 +1,31 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Mock BGC Detector (ZWA/Thu/TAA)
|
||||
Returns random presence/absence for testing.
|
||||
"""
|
||||
import argparse
|
||||
import json
|
||||
import random
|
||||
from pathlib import Path
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("--input", required=True, help="Input genome file")
|
||||
parser.add_argument("--output", required=True, help="Output JSON file")
|
||||
args = parser.parse_args()
|
||||
|
||||
# Mock logic: Randomly assign 0 or 1
|
||||
# In real impl, this would run HMM/BLAST against specific BGC databases
|
||||
results = {
|
||||
"ZWA": random.choice([0, 1]),
|
||||
"Thu": random.choice([0, 1]),
|
||||
"TAA": random.choice([0, 1])
|
||||
}
|
||||
|
||||
with open(args.output, "w") as f:
|
||||
json.dump(results, f, indent=2)
|
||||
|
||||
print(f"Mock BGC results written to {args.output}")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
1
tools/crispr_cas_analysis/__init__.py
Normal file
1
tools/crispr_cas_analysis/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
"""CRISPR-Cas Analysis Module"""
|
||||
1
tools/crispr_cas_analysis/scripts/__init__.py
Normal file
1
tools/crispr_cas_analysis/scripts/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
"""Scripts for CRISPR-Cas detection and analysis"""
|
||||
139
tools/crispr_cas_analysis/scripts/detect_crispr.py
Normal file
139
tools/crispr_cas_analysis/scripts/detect_crispr.py
Normal file
@@ -0,0 +1,139 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
CRISPR-Cas Detection Wrapper
|
||||
Wrapper for CRISPRCasFinder or similar tools to detect CRISPR arrays and Cas genes.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import logging
|
||||
import shutil
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Any
|
||||
|
||||
# Configure logging
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
def parse_args():
|
||||
parser = argparse.ArgumentParser(description="Detect CRISPR arrays and Cas genes in genome")
|
||||
parser.add_argument("--input", "-i", type=Path, required=True, help="Input genome file (.fna)")
|
||||
parser.add_argument("--output", "-o", type=Path, required=True, help="Output JSON results file")
|
||||
parser.add_argument("--tool-path", type=Path, default=None, help="Path to CRISPRCasFinder.pl")
|
||||
parser.add_argument("--mock", action="store_true", help="Use mock data (for testing without external tools)")
|
||||
return parser.parse_args()
|
||||
|
||||
def check_dependencies(tool_path: Path = None) -> bool:
|
||||
"""Check if CRISPRCasFinder is available"""
|
||||
if tool_path and tool_path.exists():
|
||||
return True
|
||||
|
||||
# Check in PATH
|
||||
if shutil.which("CRISPRCasFinder.pl"):
|
||||
return True
|
||||
|
||||
return False
|
||||
|
||||
def generate_mock_results(genome_file: Path) -> Dict[str, Any]:
|
||||
"""Generate mock CRISPR results for testing"""
|
||||
logger.info(f"Generating mock CRISPR results for {genome_file.name}")
|
||||
|
||||
strain_id = genome_file.stem
|
||||
|
||||
return {
|
||||
"strain_id": strain_id,
|
||||
"cas_systems": [
|
||||
{
|
||||
"type": "I-E",
|
||||
"subtype": "I-E",
|
||||
"position": "contig_1:15000-25000",
|
||||
"genes": ["cas1", "cas2", "cas3", "casA", "casB", "casC", "casD", "casE"]
|
||||
}
|
||||
],
|
||||
"arrays": [
|
||||
{
|
||||
"id": "CRISPR_1",
|
||||
"contig": "contig_1",
|
||||
"start": 12345,
|
||||
"end": 12678,
|
||||
"consensus_repeat": "GTTTTAGAGCTATGCTGTTTTGAATGGTCCCAAAAC",
|
||||
"num_spacers": 5,
|
||||
"spacers": [
|
||||
{"sequence": "ATGCGTCGACATGCGTCGACATGCGTCGAC", "position": 1},
|
||||
{"sequence": "CGTAGCTAGCCGTAGCTAGCCGTAGCTAGC", "position": 2},
|
||||
{"sequence": "TGCATGCATGTGCATGCATGTGCATGCATG", "position": 3},
|
||||
{"sequence": "GCTAGCTAGCGCTAGCTAGCGCTAGCTAGC", "position": 4},
|
||||
{"sequence": "AAAAATTTTTAAAAATTTTTAAAAATTTTT", "position": 5}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "CRISPR_2",
|
||||
"contig": "contig_2",
|
||||
"start": 50000,
|
||||
"end": 50500,
|
||||
"consensus_repeat": "GTTTTAGAGCTATGCTGTTTTGAATGGTCCCAAAAC",
|
||||
"num_spacers": 8,
|
||||
"spacers": [
|
||||
{"sequence": "CCCGGGAAACCCGGGAAACCCGGGAAA", "position": 1}
|
||||
]
|
||||
}
|
||||
],
|
||||
"summary": {
|
||||
"has_cas": True,
|
||||
"has_crispr": True,
|
||||
"num_arrays": 2,
|
||||
"num_spacers": 13,
|
||||
"cas_types": ["I-E"]
|
||||
},
|
||||
"metadata": {
|
||||
"tool": "CRISPRCasFinder",
|
||||
"version": "Mock-v1.0",
|
||||
"date": "2025-01-14"
|
||||
}
|
||||
}
|
||||
|
||||
def run_crisprcasfinder(input_file: Path, output_file: Path, tool_path: Path = None):
|
||||
"""Run actual CRISPRCasFinder tool (Placeholder)"""
|
||||
# This would implement the actual subprocess call to CRISPRCasFinder.pl
|
||||
# For now, we raise NotImplementedError unless mock is used
|
||||
raise NotImplementedError("Real tool integration not yet implemented. Use --mock flag.")
|
||||
|
||||
def main():
|
||||
args = parse_args()
|
||||
|
||||
if not args.input.exists():
|
||||
logger.error(f"Input file not found: {args.input}")
|
||||
sys.exit(1)
|
||||
|
||||
# Create parent directory for output if needed
|
||||
args.output.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
try:
|
||||
if args.mock:
|
||||
results = generate_mock_results(args.input)
|
||||
else:
|
||||
if not check_dependencies(args.tool_path):
|
||||
logger.warning("CRISPRCasFinder not found. Falling back to mock data.")
|
||||
results = generate_mock_results(args.input)
|
||||
else:
|
||||
# Real implementation would go here
|
||||
run_crisprcasfinder(args.input, args.output, args.tool_path)
|
||||
return
|
||||
|
||||
# Write results
|
||||
with open(args.output, 'w') as f:
|
||||
json.dump(results, f, indent=2)
|
||||
|
||||
logger.info(f"Results written to {args.output}")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error executing CRISPR detection: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
166
tools/crispr_cas_analysis/scripts/fusion_analysis.py
Normal file
166
tools/crispr_cas_analysis/scripts/fusion_analysis.py
Normal file
@@ -0,0 +1,166 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
CRISPR-Toxin Fusion Analysis
|
||||
Analyzes associations between CRISPR spacers and toxin genes.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import logging
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Any
|
||||
|
||||
# Configure logging
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
def parse_args():
|
||||
parser = argparse.ArgumentParser(description="Analyze CRISPR-Toxin associations")
|
||||
parser.add_argument("--crispr-results", type=Path, required=True, help="CRISPR detection results (JSON)")
|
||||
parser.add_argument("--toxin-results", type=Path, required=True, help="Toxin detection results (JSON or TXT)")
|
||||
parser.add_argument("--genome", type=Path, required=True, help="Original genome file (.fna)")
|
||||
parser.add_argument("--output", "-o", type=Path, required=True, help="Output analysis JSON")
|
||||
parser.add_argument("--mock", action="store_true", help="Use mock analysis logic")
|
||||
return parser.parse_args()
|
||||
|
||||
def load_json(path: Path) -> Dict:
|
||||
with open(path) as f:
|
||||
return json.load(f)
|
||||
|
||||
def calculate_distance(range1: str, range2: str) -> int:
|
||||
"""
|
||||
Calculate distance between two genomic ranges.
|
||||
Format: 'contig:start-end'
|
||||
"""
|
||||
try:
|
||||
contig1, coords1 = range1.split(':')
|
||||
start1, end1 = map(int, coords1.split('-'))
|
||||
|
||||
contig2, coords2 = range2.split(':')
|
||||
start2, end2 = map(int, coords2.split('-'))
|
||||
|
||||
if contig1 != contig2:
|
||||
return -1 # Different contigs
|
||||
|
||||
# Check for overlap
|
||||
if max(start1, start2) <= min(end1, end2):
|
||||
return 0
|
||||
|
||||
# Calculate distance
|
||||
if start1 > end2:
|
||||
return start1 - end2
|
||||
else:
|
||||
return start2 - end1
|
||||
except Exception as e:
|
||||
logger.warning(f"Error calculating distance: {e}")
|
||||
return -1
|
||||
|
||||
def mock_blast_spacers(spacers: List[str], toxins: List[Dict]) -> List[Dict]:
|
||||
"""Mock BLAST spacers against toxins"""
|
||||
matches = []
|
||||
# Simulate a match if 'Cry' is in the spacer name (just for demo logic) or random
|
||||
# In reality, we'd blast sequences.
|
||||
|
||||
# Let's just create a fake match for the first spacer
|
||||
if spacers and toxins:
|
||||
matches.append({
|
||||
"spacer_seq": spacers[0],
|
||||
"target_toxin": toxins[0].get("name", "Unknown"),
|
||||
"identity": 98.5,
|
||||
"alignment_length": 32,
|
||||
"mismatches": 1
|
||||
})
|
||||
return matches
|
||||
|
||||
def perform_fusion_analysis(crispr_data: Dict, toxin_file: Path, mock: bool = False) -> Dict:
|
||||
"""
|
||||
Main analysis logic.
|
||||
1. Map CRISPR arrays
|
||||
2. Map Toxin genes
|
||||
3. Calculate distances
|
||||
4. Check for spacer matches
|
||||
"""
|
||||
|
||||
analysis_results = {
|
||||
"strain_id": crispr_data.get("strain_id"),
|
||||
"associations": [],
|
||||
"summary": {"proximal_pairs": 0, "spacer_matches": 0}
|
||||
}
|
||||
|
||||
# Extract arrays
|
||||
arrays = crispr_data.get("arrays", [])
|
||||
|
||||
# Mock Toxin Parsing (assuming simple list for now if not JSON)
|
||||
toxins = []
|
||||
if mock:
|
||||
toxins = [
|
||||
{"name": "Cry1Ac1", "position": "contig_1:10000-12000"},
|
||||
{"name": "Vip3Aa1", "position": "contig_2:60000-62000"}
|
||||
]
|
||||
else:
|
||||
# TODO: Implement real toxin file parsing (e.g. from All_Toxins.txt)
|
||||
logger.warning("Real toxin parsing not implemented yet, using empty list")
|
||||
|
||||
# Analyze Proximity
|
||||
for array in arrays:
|
||||
array_pos = f"{array.get('contig')}:{array.get('start')}-{array.get('end')}"
|
||||
|
||||
for toxin in toxins:
|
||||
dist = calculate_distance(array_pos, toxin["position"])
|
||||
|
||||
if dist != -1 and dist < 10000: # 10kb window
|
||||
association = {
|
||||
"type": "proximity",
|
||||
"array_id": array.get("id"),
|
||||
"toxin": toxin["name"],
|
||||
"distance": dist,
|
||||
"array_position": array_pos,
|
||||
"toxin_position": toxin["position"]
|
||||
}
|
||||
analysis_results["associations"].append(association)
|
||||
analysis_results["summary"]["proximal_pairs"] += 1
|
||||
|
||||
# Analyze Spacer Matches (Mock)
|
||||
all_spacers = []
|
||||
for array in arrays:
|
||||
for spacer in array.get("spacers", []):
|
||||
all_spacers.append(spacer.get("sequence"))
|
||||
|
||||
matches = mock_blast_spacers(all_spacers, toxins)
|
||||
for match in matches:
|
||||
analysis_results["associations"].append({
|
||||
"type": "spacer_match",
|
||||
**match
|
||||
})
|
||||
analysis_results["summary"]["spacer_matches"] += 1
|
||||
|
||||
return analysis_results
|
||||
|
||||
def main():
|
||||
args = parse_args()
|
||||
|
||||
if not args.crispr_results.exists():
|
||||
logger.error(f"CRISPR results file not found: {args.crispr_results}")
|
||||
sys.exit(1)
|
||||
|
||||
try:
|
||||
crispr_data = load_json(args.crispr_results)
|
||||
|
||||
results = perform_fusion_analysis(crispr_data, args.toxin_results, args.mock)
|
||||
|
||||
# Write results
|
||||
with open(args.output, 'w') as f:
|
||||
json.dump(results, f, indent=2)
|
||||
|
||||
logger.info(f"Fusion analysis complete. Results: {args.output}")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error during fusion analysis: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
1
tools/crispr_cas_analysis/tests/__init__.py
Normal file
1
tools/crispr_cas_analysis/tests/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
"""Tests for CRISPR-Cas module"""
|
||||
42
tools/crispr_cas_analysis/tests/test_detect_crispr.py
Normal file
42
tools/crispr_cas_analysis/tests/test_detect_crispr.py
Normal file
@@ -0,0 +1,42 @@
|
||||
import pytest
|
||||
import json
|
||||
import shutil
|
||||
from pathlib import Path
|
||||
from crispr_cas.scripts.detect_crispr import generate_mock_results
|
||||
|
||||
def test_generate_mock_results(tmp_path):
|
||||
"""Test mock result generation"""
|
||||
input_file = tmp_path / "test_genome.fna"
|
||||
input_file.touch()
|
||||
|
||||
results = generate_mock_results(input_file)
|
||||
|
||||
assert results["strain_id"] == "test_genome"
|
||||
assert "cas_systems" in results
|
||||
assert "arrays" in results
|
||||
assert results["summary"]["has_cas"] is True
|
||||
assert len(results["arrays"]) > 0
|
||||
|
||||
def test_script_execution(tmp_path):
|
||||
"""Test full script execution via subprocess"""
|
||||
# Create dummy input
|
||||
input_file = tmp_path / "genome.fna"
|
||||
input_file.touch()
|
||||
output_file = tmp_path / "results.json"
|
||||
script_path = Path("crispr_cas/scripts/detect_crispr.py").absolute()
|
||||
|
||||
import subprocess
|
||||
cmd = [
|
||||
"python3", str(script_path),
|
||||
"--input", str(input_file),
|
||||
"--output", str(output_file),
|
||||
"--mock"
|
||||
]
|
||||
|
||||
result = subprocess.run(cmd, capture_output=True, text=True)
|
||||
assert result.returncode == 0
|
||||
assert output_file.exists()
|
||||
|
||||
with open(output_file) as f:
|
||||
data = json.load(f)
|
||||
assert data["strain_id"] == "genome"
|
||||
93
tools/crispr_cas_analysis/tests/test_fusion_analysis.py
Normal file
93
tools/crispr_cas_analysis/tests/test_fusion_analysis.py
Normal file
@@ -0,0 +1,93 @@
|
||||
import pytest
|
||||
import json
|
||||
from pathlib import Path
|
||||
import sys
|
||||
|
||||
# Add project root to path to allow importing modules
|
||||
sys.path.insert(0, str(Path(__file__).parents[2]))
|
||||
|
||||
from crispr_cas.scripts.fusion_analysis import calculate_distance, perform_fusion_analysis
|
||||
|
||||
def test_calculate_distance():
|
||||
"""Test genomic distance calculation"""
|
||||
# Same contig, no overlap
|
||||
# Range1: 100-200, Range2: 300-400 -> Dist 100
|
||||
assert calculate_distance("c1:100-200", "c1:300-400") == 100
|
||||
|
||||
# Same contig, overlap
|
||||
# Range1: 100-300, Range2: 200-400 -> Dist 0
|
||||
assert calculate_distance("c1:100-300", "c1:200-400") == 0
|
||||
|
||||
# Different contig
|
||||
assert calculate_distance("c1:100-200", "c2:300-400") == -1
|
||||
|
||||
# Invalid format
|
||||
assert calculate_distance("invalid", "c1:100-200") == -1
|
||||
|
||||
def test_fusion_analysis_logic(tmp_path):
|
||||
"""Test main analysis logic with mock data"""
|
||||
|
||||
# Mock CRISPR data
|
||||
crispr_data = {
|
||||
"strain_id": "test_strain",
|
||||
"arrays": [
|
||||
{
|
||||
"id": "A1",
|
||||
"contig": "contig_1",
|
||||
"start": 1000,
|
||||
"end": 2000,
|
||||
"spacers": [{"sequence": "ATGC"}]
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
# Mock toxin file (just a placeholder for path)
|
||||
toxin_file = tmp_path / "toxins.txt"
|
||||
toxin_file.touch()
|
||||
|
||||
# Run analysis in mock mode
|
||||
# In mock mode, the script generates its own toxin list:
|
||||
# {"name": "Cry1Ac1", "position": "contig_1:10000-12000"}
|
||||
# Distance: 10000 - 2000 = 8000 (< 10000 threshold) -> Should match
|
||||
|
||||
results = perform_fusion_analysis(crispr_data, toxin_file, mock=True)
|
||||
|
||||
assert results["strain_id"] == "test_strain"
|
||||
assert len(results["associations"]) > 0
|
||||
|
||||
# Check for proximity match
|
||||
proximity_matches = [a for a in results["associations"] if a["type"] == "proximity"]
|
||||
assert len(proximity_matches) > 0
|
||||
assert proximity_matches[0]["distance"] == 8000
|
||||
|
||||
def test_script_execution(tmp_path):
|
||||
"""Test full script execution via subprocess"""
|
||||
|
||||
# Create input files
|
||||
crispr_file = tmp_path / "crispr.json"
|
||||
with open(crispr_file, 'w') as f:
|
||||
json.dump({"strain_id": "test", "arrays": []}, f)
|
||||
|
||||
toxin_file = tmp_path / "toxins.txt"
|
||||
toxin_file.touch()
|
||||
|
||||
genome_file = tmp_path / "genome.fna"
|
||||
genome_file.touch()
|
||||
|
||||
output_file = tmp_path / "output.json"
|
||||
|
||||
script_path = Path("crispr_cas/scripts/fusion_analysis.py").absolute()
|
||||
|
||||
import subprocess
|
||||
cmd = [
|
||||
"python3", str(script_path),
|
||||
"--crispr-results", str(crispr_file),
|
||||
"--toxin-results", str(toxin_file),
|
||||
"--genome", str(genome_file),
|
||||
"--output", str(output_file),
|
||||
"--mock"
|
||||
]
|
||||
|
||||
result = subprocess.run(cmd, capture_output=True, text=True)
|
||||
assert result.returncode == 0
|
||||
assert output_file.exists()
|
||||
31
tools/mobilome_analysis/detect_mobilome.py
Normal file
31
tools/mobilome_analysis/detect_mobilome.py
Normal file
@@ -0,0 +1,31 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Mock Mobilome Analyzer
|
||||
Returns random count of mobile elements (transposases, plasmids, phages).
|
||||
"""
|
||||
import argparse
|
||||
import json
|
||||
import random
|
||||
from pathlib import Path
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("--input", required=True, help="Input genome file")
|
||||
parser.add_argument("--output", required=True, help="Output JSON file")
|
||||
args = parser.parse_args()
|
||||
|
||||
# Mock logic: Random count between 0 and 100
|
||||
# In real impl, this would sum hits of IS elements, plasmid replicons, phage proteins
|
||||
count = random.randint(0, 100)
|
||||
|
||||
results = {
|
||||
"mobile_elements_count": count
|
||||
}
|
||||
|
||||
with open(args.output, "w") as f:
|
||||
json.dump(results, f, indent=2)
|
||||
|
||||
print(f"Mock Mobilome results written to {args.output}")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Reference in New Issue
Block a user