- Move AGENTS.md, CLEANUP_SUMMARY.md, DOCUMENTATION_GUIDE.md, IMPLEMENTATION_SUMMARY.md, QUICK_COMMANDS.md to docs/project-docs/ - Update AGENTS.md to include splicing module documentation - Update mkdocs.yml navigation to include project-docs section - Update .gitignore to track docs/ directory - Add docs/plans/ splicing design documents Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
96 lines
3.5 KiB
Markdown
96 lines
3.5 KiB
Markdown
# Tylosin High-Throughput Splicing & Screening System Design
|
|
|
|
## 1. System Overview
|
|
|
|
The **Tylosin Splicer** is a combinatorial chemistry engine designed to optimize the Tylosin scaffold. It systematically modifies positions 7, 15, and 16 of the macrolactone ring by splicing high-potential fragments identified by the SIME platform, then immediately evaluating their predicted antibacterial activity.
|
|
|
|
## 2. Component Architecture
|
|
|
|
```mermaid
|
|
componentDiagram
|
|
package "Inputs" {
|
|
[Tylosin SMILES] as InputCore
|
|
[Fragment CSVs] as InputFrags
|
|
note right of InputFrags: SIME predicted\nhigh-activity fragments
|
|
}
|
|
|
|
package "Core Preparation" {
|
|
[Scaffold Preparer] as CorePrep
|
|
[Ring Numbering] as RingNum
|
|
note right of CorePrep: Identifies 7, 15, 16\nReplaces groups with anchors
|
|
}
|
|
|
|
package "Fragment Processing" {
|
|
[Fragment Loader] as FragLoad
|
|
[Attachment Point Selector] as AttachSel
|
|
note right of AttachSel: Heuristic rules to\nfind connection points
|
|
}
|
|
|
|
package "Splicing Engine" {
|
|
[Combinatorial Splicer] as Splicer
|
|
[Conformer Validator] as Validator
|
|
note right of Splicer: RDKit ChemicalReaction\nor ReplaceSubstructs
|
|
}
|
|
|
|
package "Evaluation (SIME)" {
|
|
[Activity Predictor] as Predictor
|
|
[Broad Spectrum Model] as Model
|
|
}
|
|
|
|
package "Outputs" {
|
|
[Ranked Results CSV] as Output
|
|
}
|
|
|
|
InputCore --> CorePrep
|
|
RingNum -.-> CorePrep : "Locate positions"
|
|
|
|
InputFrags --> FragLoad
|
|
FragLoad --> AttachSel
|
|
|
|
CorePrep --> Splicer : "Scaffold with Anchors (*)"
|
|
AttachSel --> Splicer : "Activated Fragments (R-Groups)"
|
|
|
|
Splicer --> Validator : "Raw Candidates"
|
|
Validator --> Predictor : "Valid 3D Structures"
|
|
|
|
Predictor --> Model : "Inference"
|
|
Model --> Output : "Scores & Rankings"
|
|
```
|
|
|
|
## 3. Data Flow Strategy
|
|
|
|
### Step 1: Scaffold Preparation (`CorePrep`)
|
|
- **Input**: Tylosin SMILES.
|
|
- **Action**:
|
|
1. Parse SMILES using `macro_split` utils.
|
|
2. Use `RingNumbering` to identify atoms at indices 7, 15, 16.
|
|
3. Perform "surgical removal": Break bonds to existing side chains at these indices.
|
|
4. Attach "Anchor Atoms" (Isotopes or Dummy Atoms `[*:1]`, `[*:2]`, `[*:3]`) to the ring carbons.
|
|
|
|
### Step 2: Fragment Activation (`AttachSel`)
|
|
- **Input**: Fragment SMILES from SIME CSVs.
|
|
- **Action**: Convert a standalone molecule into a substituent (R-Group).
|
|
- **Strategy A (Smart)**: Identify heteroatoms (-NH2, -OH) as attachment points.
|
|
- **Strategy B (Random)**: Randomly replace a Hydrogen with an attachment point.
|
|
- **Strategy C (Linker)**: Add a small linker (e.g., -CH2-) if needed.
|
|
|
|
### Step 3: Combinatorial Splicing (`Splicer`)
|
|
- **Input**: 1 Scaffold + N Fragments.
|
|
- **Action**:
|
|
- **Single Point**: Modify only pos 7, or 15, or 16.
|
|
- **Multi Point**: Combinatorial modification (e.g., 7+15).
|
|
- **Reaction**: use `rdkit.Chem.rdChemReactions` or `ReplaceSubstructs`.
|
|
|
|
### Step 4: High-Throughput Prediction (`Predictor`)
|
|
- **Integration**: Import `SIME.utils.mole_predictor`.
|
|
- **Batching**: Collect valid spliced molecules into batches of 128/256.
|
|
- **Scoring**: Run `ParallelBroadSpectrumPredictor`.
|
|
- **Filtering**: Keep only molecules with `broad_spectrum == True` or high inhibition scores.
|
|
|
|
## 4. Technology Stack
|
|
- **Core Logic**: Python 3.9+
|
|
- **Chemistry Engine**: RDKit
|
|
- **Data Handling**: Pandas, NumPy
|
|
- **ML Inference**: PyTorch (via SIME models)
|
|
- **Parallelization**: Python `multiprocessing` (via SIME batch predictor)
|