Files
bttoxin-pipeline/tools

BtToxin Analysis Modules

This directory contains specialized analysis modules integrated into the BtToxin Pipeline. Each module focuses on identifying and characterizing specific genomic features that contribute to the insecticidal potential of Bacillus thuringiensis strains.

1. BtToxin_Digger

Core Toxin Identification Module

This is the foundational module of the pipeline, responsible for identifying Cry, Cyt, and Vip toxin genes in bacterial genomes.

  • Function:
    • Predicts Open Reading Frames (ORFs) from genomic sequences (.fna).
    • Translates coding sequences (CDS) to proteins.
    • Uses BLAST and HMM (Hidden Markov Models) to search against a curated database of known Bt toxins.
    • Identifies toxin candidates and classifies them into families/subfamilies based on sequence identity.
  • Key Metrics: Sequence Identity (Identity), Coverage (Coverage), and HMM domain hits.
  • Role: Provides the primary "evidence" (w_i) for the Shotter scoring system.

2. BGC Analysis (bgc_analysis)

Biosynthetic Gene Cluster Detection

This module detects three specific classes of insecticidal protein gene clusters that serve as independent markers of insecticidal activity.

  • Targets:
    • ZWA: Zwittermicin A biosynthetic gene cluster.
    • Thu: Thuringiensin (beta-exotoxin) biosynthetic gene cluster.
    • TAA: Toxin A (insecticidal protein) gene cluster.
  • Methodology:
    • Uses BLAST/HMM to detect signature enzymes and backbone genes specific to these clusters.
    • Returns a binary status (Present/Absent) for each cluster type (b_Z, b_T, b_A \in \{0, 1\}).
  • Contribution to Scoring:
    • The presence of these clusters acts as a positive prior, boosting the final toxicity score (S_{\text{final}}) because they represent functional insecticidal modules independent of Cry/Vip proteins.

3. Mobilome Analysis (mobilome_analysis)

Mobile Genetic Element Quantification

This module quantifies the "mobilome"—the collection of mobile genetic elements—which correlates with a strain's ability to acquire, rearrange, and maintain toxin genes.

  • Targets:
    • Transposases: Enzymes that facilitate gene movement.
    • Plasmids: Extrachromosomal DNA often carrying toxin genes in Bt.
    • Phages: Viral elements that can mediate horizontal gene transfer.
  • Methodology:
    • Annotates and counts these elements in the genome.
    • Returns a total count or specific counts (m).
  • Contribution to Scoring:
    • A higher mobilome count indicates a more "open" genome capable of HGT (Horizontal Gene Transfer).
    • Contributes a positive prior (via a saturation function g(m)) to the toxicity score, reflecting a higher potential for evolving or acquiring diverse toxin cocktails.

4. CRISPR-Cas Analysis (crispr_cas_analysis)

Genome Defense System Characterization

This module characterizes the CRISPR-Cas immune systems, which act as barriers to foreign DNA (including plasmids and phages).

  • Targets:
    • Cas Proteins: Identification of Cas gene clusters.
    • CRISPR Arrays: Detection of direct repeats and spacers.
  • Methodology:
    • Classifies the system status into three levels: Complete (functional), Incomplete (degraded), or Absent.
    • Returns a status code c \in \{0, 1, 2\} (0=Absent, 1=Incomplete, 2=Complete).
  • Contribution to Scoring:
    • Negative Prior: A complete, functional CRISPR system (c=2) limits the intake of foreign plasmids (which often carry toxins).
    • Therefore, an Absent system allows for the highest potential of plasmid-borne toxin acquisition (Highest score boost), while a Complete system penalizes the prior probability (Lowest/No boost). This follows the logic: Absent > Incomplete > Complete for toxicity potential.

Integration in Shotter Scoring

These modules work together to refine the final insecticidal activity prediction:

  1. Evidence: BtToxin_Digger provides direct evidence of toxin genes (S_{\text{tox}}).
  2. Priors: BGC, Mobilome, and CRISPR modules provide a "genomic context" prior (\Delta(\text{strain})).

The final score combines these using a logit-based adjustment:


S_{\text{final}} = \sigma\left( \operatorname{logit}(S_{\text{tox}}) + \Delta(\text{strain}) \right)

Where \Delta(\text{strain}) aggregates the positive boosts from BGCs/Mobilome and the adjustment from CRISPR status.

For full mathematical details, see docs/shotter_math_full_zh_typora.md.