- Move AGENTS.md, CLEANUP_SUMMARY.md, DOCUMENTATION_GUIDE.md, IMPLEMENTATION_SUMMARY.md, QUICK_COMMANDS.md to docs/project-docs/ - Update AGENTS.md to include splicing module documentation - Update mkdocs.yml navigation to include project-docs section - Update .gitignore to track docs/ directory - Add docs/plans/ splicing design documents Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
184 lines
5.3 KiB
Markdown
184 lines
5.3 KiB
Markdown
# Tylosin Splicing System Implementation Plan
|
|
|
|
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
|
|
|
|
**Goal:** Build a pipeline to splice SIME-identified fragments onto the Tylosin scaffold at positions 7, 15, and 16, and predict their antibacterial activity.
|
|
|
|
**Architecture:** A Python-based ETL pipeline using RDKit for structural manipulation (`macro_split`) and PyTorch for activity prediction (`SIME`).
|
|
|
|
**Tech Stack:** Python, RDKit, Pandas, PyTorch (SIME), Pytest.
|
|
|
|
---
|
|
|
|
### Task 1: Environment & Project Structure Setup
|
|
|
|
**Files:**
|
|
- Create: `scripts/tylosin_splicer.py` (Main entry point stub)
|
|
- Create: `src/splicing/__init__.py`
|
|
- Create: `src/splicing/scaffold_prep.py`
|
|
- Create: `tests/test_splicing.py`
|
|
|
|
**Step 1: Create directory structure**
|
|
```bash
|
|
mkdir -p src/splicing
|
|
touch src/splicing/__init__.py
|
|
```
|
|
|
|
**Step 2: Create a basic test to verify environment**
|
|
Write a test that imports both `macro_split` and `SIME` modules to ensure the workspace handles imports correctly.
|
|
|
|
```python
|
|
# tests/test_env_integration.py
|
|
import sys
|
|
import os
|
|
sys.path.append("/home/zly/project/SIME") # Hack for now, will clean up later
|
|
sys.path.append("/home/zly/project/merge/macro_split")
|
|
|
|
def test_imports():
|
|
from src.ring_numbering import get_macrolactone_numbering
|
|
from utils.mole_predictor import ParallelBroadSpectrumPredictor
|
|
assert True
|
|
```
|
|
|
|
**Step 3: Run test**
|
|
`pixi run pytest tests/test_env_integration.py`
|
|
|
|
---
|
|
|
|
### Task 2: Scaffold Preparation (The "Socket")
|
|
|
|
**Files:**
|
|
- Modify: `src/splicing/scaffold_prep.py`
|
|
- Test: `tests/test_scaffold_prep.py`
|
|
|
|
**Step 1: Write failing test**
|
|
Test that `prepare_tylosin_scaffold` returns a molecule with dummy atoms at positions 7, 15, and 16.
|
|
|
|
```python
|
|
# tests/test_scaffold_prep.py
|
|
from rdkit import Chem
|
|
from src.splicing.scaffold_prep import prepare_tylosin_scaffold
|
|
|
|
TYLOSIN_SMILES = "CCC1OC(=O)C(C)C(O)C(C)C(O)C(C)C(OC2CC(C)(O)C(O)C(C)O2)CC(C)C(=O)C=CC=C1COC3OS(C)C(O)C(N(C)C)C3O" # Simplified/Example
|
|
|
|
def test_scaffold_prep():
|
|
scaffold, mapping = prepare_tylosin_scaffold(TYLOSIN_SMILES, positions=[7, 15, 16])
|
|
# Check if we have mapped atoms
|
|
assert 7 in mapping
|
|
assert 15 in mapping
|
|
assert 16 in mapping
|
|
# Check if they are dummy atoms or have specific isotopes
|
|
```
|
|
|
|
**Step 2: Implement `prepare_tylosin_scaffold`**
|
|
Use `get_macrolactone_numbering` to find the atom indices.
|
|
Use `RWMol` to replace side chains at those indices with a dummy atom (e.g., At number 0 or Isotope).
|
|
|
|
**Step 3: Run tests**
|
|
`pixi run pytest tests/test_scaffold_prep.py`
|
|
|
|
---
|
|
|
|
### Task 3: Fragment Activation (The "Plug")
|
|
|
|
**Files:**
|
|
- Create: `src/splicing/fragment_prep.py`
|
|
- Test: `tests/test_fragment_prep.py`
|
|
|
|
**Step 1: Write failing test**
|
|
Test that `activate_fragment` takes a SMILES and returns a molecule with *one* attachment point.
|
|
|
|
```python
|
|
# tests/test_fragment_prep.py
|
|
from src.splicing.fragment_prep import activate_fragment
|
|
|
|
def test_activate_fragment_smart():
|
|
# Fragment with -OH
|
|
frag_smiles = "CCO"
|
|
activated = activate_fragment(frag_smiles, strategy="smart")
|
|
# Should find the O and replace H with attachment point
|
|
assert "*" in Chem.MolToSmiles(activated)
|
|
|
|
def test_activate_fragment_random():
|
|
frag_smiles = "CCCCC"
|
|
activated = activate_fragment(frag_smiles, strategy="random")
|
|
assert "*" in Chem.MolToSmiles(activated)
|
|
```
|
|
|
|
**Step 2: Implement `activate_fragment`**
|
|
- **Smart**: Look for -NH2, -OH, -SH. Use SMARTS to find them, replace a H with `*`.
|
|
- **Random**: Pick a random Carbon, replace a H with `*`.
|
|
|
|
**Step 3: Run tests**
|
|
`pixi run pytest tests/test_fragment_prep.py`
|
|
|
|
---
|
|
|
|
### Task 4: Splicing Engine (The Assembly)
|
|
|
|
**Files:**
|
|
- Create: `src/splicing/engine.py`
|
|
- Test: `tests/test_splicing_engine.py`
|
|
|
|
**Step 1: Write failing test**
|
|
Test connecting an activated fragment to the scaffold.
|
|
|
|
```python
|
|
def test_splice_molecules():
|
|
scaffold = ... # prepared scaffold
|
|
fragment = ... # activated fragment
|
|
product = splice_molecule(scaffold, fragment, position=7)
|
|
assert product is not None
|
|
assert Chem.MolToSmiles(product) != Chem.MolToSmiles(scaffold)
|
|
```
|
|
|
|
**Step 2: Implement `splice_molecule`**
|
|
Use `Chem.ReplaceSubstructs` or `Chem.rdChemReactions`.
|
|
Ensure the connection is chemically valid.
|
|
|
|
**Step 3: Run tests**
|
|
`pixi run pytest tests/test_splicing_engine.py`
|
|
|
|
---
|
|
|
|
### Task 5: Prediction Pipeline Integration
|
|
|
|
**Files:**
|
|
- Create: `src/splicing/pipeline.py`
|
|
- Test: `tests/test_pipeline.py`
|
|
|
|
**Step 1: Write failing test (Mocked)**
|
|
Mock the SIME predictor to avoid loading heavy models during unit tests.
|
|
|
|
```python
|
|
def test_pipeline_flow(mocker):
|
|
# Mock predictor
|
|
mocker.patch('utils.mole_predictor.ParallelBroadSpectrumPredictor')
|
|
|
|
frags = ["CCO", "CCN"]
|
|
results = run_splicing_pipeline(TYLOSIN_SMILES, frags, positions=[7])
|
|
assert len(results) > 0
|
|
```
|
|
|
|
**Step 2: Implement `run_splicing_pipeline`**
|
|
1. Prep scaffold.
|
|
2. Loop fragments -> activate -> splice.
|
|
3. Batch generate SMILES.
|
|
4. Call `ParallelBroadSpectrumPredictor`.
|
|
5. Return results.
|
|
|
|
**Step 3: Run tests**
|
|
|
|
---
|
|
|
|
### Task 6: CLI and Final Execution
|
|
|
|
**Files:**
|
|
- Create: `scripts/run_tylosin_optimization.py`
|
|
|
|
**Step 1: Implement CLI**
|
|
Arguments: `--input-scaffold`, `--fragment-csv`, `--positions`, `--output`.
|
|
|
|
**Step 2: Integration Test**
|
|
Run with a small subset of the fragment CSV (head -n 10).
|