Files

hotwa a768d26e47 Move project docs to docs/project-docs and update references

- Move AGENTS.md, CLEANUP_SUMMARY.md, DOCUMENTATION_GUIDE.md,
  IMPLEMENTATION_SUMMARY.md, QUICK_COMMANDS.md to docs/project-docs/
- Update AGENTS.md to include splicing module documentation
- Update mkdocs.yml navigation to include project-docs section
- Update .gitignore to track docs/ directory
- Add docs/plans/ splicing design documents

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-18 17:56:03 +08:00

3.5 KiB

Raw Blame History

Tylosin High-Throughput Splicing & Screening System Design

1. System Overview

The Tylosin Splicer is a combinatorial chemistry engine designed to optimize the Tylosin scaffold. It systematically modifies positions 7, 15, and 16 of the macrolactone ring by splicing high-potential fragments identified by the SIME platform, then immediately evaluating their predicted antibacterial activity.

2. Component Architecture

componentDiagram
    package "Inputs" {
        [Tylosin SMILES] as InputCore
        [Fragment CSVs] as InputFrags
        note right of InputFrags: SIME predicted\nhigh-activity fragments
    }

    package "Core Preparation" {
        [Scaffold Preparer] as CorePrep
        [Ring Numbering] as RingNum
        note right of CorePrep: Identifies 7, 15, 16\nReplaces groups with anchors
    }

    package "Fragment Processing" {
        [Fragment Loader] as FragLoad
        [Attachment Point Selector] as AttachSel
        note right of AttachSel: Heuristic rules to\nfind connection points
    }

    package "Splicing Engine" {
        [Combinatorial Splicer] as Splicer
        [Conformer Validator] as Validator
        note right of Splicer: RDKit ChemicalReaction\nor ReplaceSubstructs
    }

    package "Evaluation (SIME)" {
        [Activity Predictor] as Predictor
        [Broad Spectrum Model] as Model
    }

    package "Outputs" {
        [Ranked Results CSV] as Output
    }

    InputCore --> CorePrep
    RingNum -.-> CorePrep : "Locate positions"

    InputFrags --> FragLoad
    FragLoad --> AttachSel

    CorePrep --> Splicer : "Scaffold with Anchors (*)"
    AttachSel --> Splicer : "Activated Fragments (R-Groups)"

    Splicer --> Validator : "Raw Candidates"
    Validator --> Predictor : "Valid 3D Structures"

    Predictor --> Model : "Inference"
    Model --> Output : "Scores & Rankings"

3. Data Flow Strategy

Step 1: Scaffold Preparation (`CorePrep`)

Input: Tylosin SMILES.
Action:
1. Parse SMILES using macro_split utils.
2. Use RingNumbering to identify atoms at indices 7, 15, 16.
3. Perform "surgical removal": Break bonds to existing side chains at these indices.
4. Attach "Anchor Atoms" (Isotopes or Dummy Atoms [*:1], [*:2], [*:3]) to the ring carbons.

Step 2: Fragment Activation (`AttachSel`)

Input: Fragment SMILES from SIME CSVs.
Action: Convert a standalone molecule into a substituent (R-Group).
- Strategy A (Smart): Identify heteroatoms (-NH2, -OH) as attachment points.
- Strategy B (Random): Randomly replace a Hydrogen with an attachment point.
- Strategy C (Linker): Add a small linker (e.g., -CH2-) if needed.

Step 3: Combinatorial Splicing (`Splicer`)

Input: 1 Scaffold + N Fragments.
Action:
- Single Point: Modify only pos 7, or 15, or 16.
- Multi Point: Combinatorial modification (e.g., 7+15).
- Reaction: use rdkit.Chem.rdChemReactions or ReplaceSubstructs.

Step 4: High-Throughput Prediction (`Predictor`)

Integration: Import SIME.utils.mole_predictor.
Batching: Collect valid spliced molecules into batches of 128/256.
Scoring: Run ParallelBroadSpectrumPredictor.
Filtering: Keep only molecules with broad_spectrum == True or high inhibition scores.

4. Technology Stack

Core Logic: Python 3.9+
Chemistry Engine: RDKit
Data Handling: Pandas, NumPy
ML Inference: PyTorch (via SIME models)
Parallelization: Python multiprocessing (via SIME batch predictor)

3.5 KiB Raw Blame History