# Tylosin High-Throughput Splicing & Screening System Design ## 1. System Overview The **Tylosin Splicer** is a combinatorial chemistry engine designed to optimize the Tylosin scaffold. It systematically modifies positions 7, 15, and 16 of the macrolactone ring by splicing high-potential fragments identified by the SIME platform, then immediately evaluating their predicted antibacterial activity. ## 2. Component Architecture ```mermaid componentDiagram package "Inputs" { [Tylosin SMILES] as InputCore [Fragment CSVs] as InputFrags note right of InputFrags: SIME predicted\nhigh-activity fragments } package "Core Preparation" { [Scaffold Preparer] as CorePrep [Ring Numbering] as RingNum note right of CorePrep: Identifies 7, 15, 16\nReplaces groups with anchors } package "Fragment Processing" { [Fragment Loader] as FragLoad [Attachment Point Selector] as AttachSel note right of AttachSel: Heuristic rules to\nfind connection points } package "Splicing Engine" { [Combinatorial Splicer] as Splicer [Conformer Validator] as Validator note right of Splicer: RDKit ChemicalReaction\nor ReplaceSubstructs } package "Evaluation (SIME)" { [Activity Predictor] as Predictor [Broad Spectrum Model] as Model } package "Outputs" { [Ranked Results CSV] as Output } InputCore --> CorePrep RingNum -.-> CorePrep : "Locate positions" InputFrags --> FragLoad FragLoad --> AttachSel CorePrep --> Splicer : "Scaffold with Anchors (*)" AttachSel --> Splicer : "Activated Fragments (R-Groups)" Splicer --> Validator : "Raw Candidates" Validator --> Predictor : "Valid 3D Structures" Predictor --> Model : "Inference" Model --> Output : "Scores & Rankings" ``` ## 3. Data Flow Strategy ### Step 1: Scaffold Preparation (`CorePrep`) - **Input**: Tylosin SMILES. - **Action**: 1. Parse SMILES using `macro_split` utils. 2. Use `RingNumbering` to identify atoms at indices 7, 15, 16. 3. Perform "surgical removal": Break bonds to existing side chains at these indices. 4. Attach "Anchor Atoms" (Isotopes or Dummy Atoms `[*:1]`, `[*:2]`, `[*:3]`) to the ring carbons. ### Step 2: Fragment Activation (`AttachSel`) - **Input**: Fragment SMILES from SIME CSVs. - **Action**: Convert a standalone molecule into a substituent (R-Group). - **Strategy A (Smart)**: Identify heteroatoms (-NH2, -OH) as attachment points. - **Strategy B (Random)**: Randomly replace a Hydrogen with an attachment point. - **Strategy C (Linker)**: Add a small linker (e.g., -CH2-) if needed. ### Step 3: Combinatorial Splicing (`Splicer`) - **Input**: 1 Scaffold + N Fragments. - **Action**: - **Single Point**: Modify only pos 7, or 15, or 16. - **Multi Point**: Combinatorial modification (e.g., 7+15). - **Reaction**: use `rdkit.Chem.rdChemReactions` or `ReplaceSubstructs`. ### Step 4: High-Throughput Prediction (`Predictor`) - **Integration**: Import `SIME.utils.mole_predictor`. - **Batching**: Collect valid spliced molecules into batches of 128/256. - **Scoring**: Run `ParallelBroadSpectrumPredictor`. - **Filtering**: Keep only molecules with `broad_spectrum == True` or high inhibition scores. ## 4. Technology Stack - **Core Logic**: Python 3.9+ - **Chemistry Engine**: RDKit - **Data Handling**: Pandas, NumPy - **ML Inference**: PyTorch (via SIME models) - **Parallelization**: Python `multiprocessing` (via SIME batch predictor)