Refactor: Unified pipeline execution, simplified UI, and fixed Docker config
- Backend: Refactored tasks.py to directly invoke run_single_fna_pipeline.py for consistency. - Backend: Changed output format to ZIP and added auto-cleanup of intermediate files. - Backend: Fixed language parameter passing in API and tasks. - Frontend: Removed CRISPR Fusion UI elements from Submit and Monitor views. - Frontend: Implemented simulated progress bar for better UX. - Frontend: Restored One-click load button and added result file structure documentation. - Docker: Fixed critical Restarting loop by removing incorrect image directive in docker-compose.yml. - Docker: Optimized Dockerfile to correct .pixi environment path issues and prevent accidental deletion of frontend assets.
This commit is contained in:
81
tools/README.md
Normal file
81
tools/README.md
Normal file
@@ -0,0 +1,81 @@
|
||||
# BtToxin Analysis Modules
|
||||
|
||||
This directory contains specialized analysis modules integrated into the BtToxin Pipeline. Each module focuses on identifying and characterizing specific genomic features that contribute to the insecticidal potential of *Bacillus thuringiensis* strains.
|
||||
|
||||
## 1. BtToxin_Digger
|
||||
**Core Toxin Identification Module**
|
||||
|
||||
This is the foundational module of the pipeline, responsible for identifying Cry, Cyt, and Vip toxin genes in bacterial genomes.
|
||||
|
||||
* **Function**:
|
||||
* Predicts Open Reading Frames (ORFs) from genomic sequences (.fna).
|
||||
* Translates coding sequences (CDS) to proteins.
|
||||
* Uses BLAST and HMM (Hidden Markov Models) to search against a curated database of known Bt toxins.
|
||||
* Identifies toxin candidates and classifies them into families/subfamilies based on sequence identity.
|
||||
* **Key Metrics**: Sequence Identity (`Identity`), Coverage (`Coverage`), and HMM domain hits.
|
||||
* **Role**: Provides the primary "evidence" ($w_i$) for the Shotter scoring system.
|
||||
|
||||
## 2. BGC Analysis (bgc_analysis)
|
||||
**Biosynthetic Gene Cluster Detection**
|
||||
|
||||
This module detects three specific classes of insecticidal protein gene clusters that serve as independent markers of insecticidal activity.
|
||||
|
||||
* **Targets**:
|
||||
* **ZWA**: Zwittermicin A biosynthetic gene cluster.
|
||||
* **Thu**: Thuringiensin (beta-exotoxin) biosynthetic gene cluster.
|
||||
* **TAA**: Toxin A (insecticidal protein) gene cluster.
|
||||
* **Methodology**:
|
||||
* Uses BLAST/HMM to detect signature enzymes and backbone genes specific to these clusters.
|
||||
* Returns a binary status (Present/Absent) for each cluster type ($b_Z, b_T, b_A \in \{0, 1\}$).
|
||||
* **Contribution to Scoring**:
|
||||
* The presence of these clusters acts as a **positive prior**, boosting the final toxicity score ($S_{\text{final}}$) because they represent functional insecticidal modules independent of Cry/Vip proteins.
|
||||
|
||||
## 3. Mobilome Analysis (mobilome_analysis)
|
||||
**Mobile Genetic Element Quantification**
|
||||
|
||||
This module quantifies the "mobilome"—the collection of mobile genetic elements—which correlates with a strain's ability to acquire, rearrange, and maintain toxin genes.
|
||||
|
||||
* **Targets**:
|
||||
* **Transposases**: Enzymes that facilitate gene movement.
|
||||
* **Plasmids**: Extrachromosomal DNA often carrying toxin genes in Bt.
|
||||
* **Phages**: Viral elements that can mediate horizontal gene transfer.
|
||||
* **Methodology**:
|
||||
* Annotates and counts these elements in the genome.
|
||||
* Returns a total count or specific counts ($m$).
|
||||
* **Contribution to Scoring**:
|
||||
* A higher mobilome count indicates a more "open" genome capable of HGT (Horizontal Gene Transfer).
|
||||
* Contributes a **positive prior** (via a saturation function $g(m)$) to the toxicity score, reflecting a higher potential for evolving or acquiring diverse toxin cocktails.
|
||||
|
||||
## 4. CRISPR-Cas Analysis (crispr_cas_analysis)
|
||||
**Genome Defense System Characterization**
|
||||
|
||||
This module characterizes the CRISPR-Cas immune systems, which act as barriers to foreign DNA (including plasmids and phages).
|
||||
|
||||
* **Targets**:
|
||||
* **Cas Proteins**: Identification of Cas gene clusters.
|
||||
* **CRISPR Arrays**: Detection of direct repeats and spacers.
|
||||
* **Methodology**:
|
||||
* Classifies the system status into three levels: **Complete** (functional), **Incomplete** (degraded), or **Absent**.
|
||||
* Returns a status code $c \in \{0, 1, 2\}$ (0=Absent, 1=Incomplete, 2=Complete).
|
||||
* **Contribution to Scoring**:
|
||||
* **Negative Prior**: A complete, functional CRISPR system ($c=2$) limits the intake of foreign plasmids (which often carry toxins).
|
||||
* Therefore, an **Absent** system allows for the highest potential of plasmid-borne toxin acquisition (Highest score boost), while a **Complete** system penalizes the prior probability (Lowest/No boost). This follows the logic: *Absent > Incomplete > Complete* for toxicity potential.
|
||||
|
||||
---
|
||||
|
||||
## Integration in Shotter Scoring
|
||||
|
||||
These modules work together to refine the final insecticidal activity prediction:
|
||||
|
||||
1. **Evidence**: **BtToxin_Digger** provides direct evidence of toxin genes ($S_{\text{tox}}$).
|
||||
2. **Priors**: **BGC**, **Mobilome**, and **CRISPR** modules provide a "genomic context" prior ($\Delta(\text{strain})$).
|
||||
|
||||
The final score combines these using a logit-based adjustment:
|
||||
|
||||
$$
|
||||
S_{\text{final}} = \sigma\left( \operatorname{logit}(S_{\text{tox}}) + \Delta(\text{strain}) \right)
|
||||
$$
|
||||
|
||||
Where $\Delta(\text{strain})$ aggregates the positive boosts from BGCs/Mobilome and the adjustment from CRISPR status.
|
||||
|
||||
For full mathematical details, see [docs/shotter_math_full_zh_typora.md](../docs/shotter_math_full_zh_typora.md).
|
||||
Reference in New Issue
Block a user