- Move AGENTS.md, CLEANUP_SUMMARY.md, DOCUMENTATION_GUIDE.md, IMPLEMENTATION_SUMMARY.md, QUICK_COMMANDS.md to docs/project-docs/ - Update AGENTS.md to include splicing module documentation - Update mkdocs.yml navigation to include project-docs section - Update .gitignore to track docs/ directory - Add docs/plans/ splicing design documents Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3.5 KiB
3.5 KiB
Tylosin High-Throughput Splicing & Screening System Design
1. System Overview
The Tylosin Splicer is a combinatorial chemistry engine designed to optimize the Tylosin scaffold. It systematically modifies positions 7, 15, and 16 of the macrolactone ring by splicing high-potential fragments identified by the SIME platform, then immediately evaluating their predicted antibacterial activity.
2. Component Architecture
componentDiagram
package "Inputs" {
[Tylosin SMILES] as InputCore
[Fragment CSVs] as InputFrags
note right of InputFrags: SIME predicted\nhigh-activity fragments
}
package "Core Preparation" {
[Scaffold Preparer] as CorePrep
[Ring Numbering] as RingNum
note right of CorePrep: Identifies 7, 15, 16\nReplaces groups with anchors
}
package "Fragment Processing" {
[Fragment Loader] as FragLoad
[Attachment Point Selector] as AttachSel
note right of AttachSel: Heuristic rules to\nfind connection points
}
package "Splicing Engine" {
[Combinatorial Splicer] as Splicer
[Conformer Validator] as Validator
note right of Splicer: RDKit ChemicalReaction\nor ReplaceSubstructs
}
package "Evaluation (SIME)" {
[Activity Predictor] as Predictor
[Broad Spectrum Model] as Model
}
package "Outputs" {
[Ranked Results CSV] as Output
}
InputCore --> CorePrep
RingNum -.-> CorePrep : "Locate positions"
InputFrags --> FragLoad
FragLoad --> AttachSel
CorePrep --> Splicer : "Scaffold with Anchors (*)"
AttachSel --> Splicer : "Activated Fragments (R-Groups)"
Splicer --> Validator : "Raw Candidates"
Validator --> Predictor : "Valid 3D Structures"
Predictor --> Model : "Inference"
Model --> Output : "Scores & Rankings"
3. Data Flow Strategy
Step 1: Scaffold Preparation (CorePrep)
- Input: Tylosin SMILES.
- Action:
- Parse SMILES using
macro_splitutils. - Use
RingNumberingto identify atoms at indices 7, 15, 16. - Perform "surgical removal": Break bonds to existing side chains at these indices.
- Attach "Anchor Atoms" (Isotopes or Dummy Atoms
[*:1],[*:2],[*:3]) to the ring carbons.
- Parse SMILES using
Step 2: Fragment Activation (AttachSel)
- Input: Fragment SMILES from SIME CSVs.
- Action: Convert a standalone molecule into a substituent (R-Group).
- Strategy A (Smart): Identify heteroatoms (-NH2, -OH) as attachment points.
- Strategy B (Random): Randomly replace a Hydrogen with an attachment point.
- Strategy C (Linker): Add a small linker (e.g., -CH2-) if needed.
Step 3: Combinatorial Splicing (Splicer)
- Input: 1 Scaffold + N Fragments.
- Action:
- Single Point: Modify only pos 7, or 15, or 16.
- Multi Point: Combinatorial modification (e.g., 7+15).
- Reaction: use
rdkit.Chem.rdChemReactionsorReplaceSubstructs.
Step 4: High-Throughput Prediction (Predictor)
- Integration: Import
SIME.utils.mole_predictor. - Batching: Collect valid spliced molecules into batches of 128/256.
- Scoring: Run
ParallelBroadSpectrumPredictor. - Filtering: Keep only molecules with
broad_spectrum == Trueor high inhibition scores.
4. Technology Stack
- Core Logic: Python 3.9+
- Chemistry Engine: RDKit
- Data Handling: Pandas, NumPy
- ML Inference: PyTorch (via SIME models)
- Parallelization: Python
multiprocessing(via SIME batch predictor)