# Tylosin Splicing System Implementation Plan > **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. **Goal:** Build a pipeline to splice SIME-identified fragments onto the Tylosin scaffold at positions 7, 15, and 16, and predict their antibacterial activity. **Architecture:** A Python-based ETL pipeline using RDKit for structural manipulation (`macro_split`) and PyTorch for activity prediction (`SIME`). **Tech Stack:** Python, RDKit, Pandas, PyTorch (SIME), Pytest. --- ### Task 1: Environment & Project Structure Setup **Files:** - Create: `scripts/tylosin_splicer.py` (Main entry point stub) - Create: `src/splicing/__init__.py` - Create: `src/splicing/scaffold_prep.py` - Create: `tests/test_splicing.py` **Step 1: Create directory structure** ```bash mkdir -p src/splicing touch src/splicing/__init__.py ``` **Step 2: Create a basic test to verify environment** Write a test that imports both `macro_split` and `SIME` modules to ensure the workspace handles imports correctly. ```python # tests/test_env_integration.py import sys import os sys.path.append("/home/zly/project/SIME") # Hack for now, will clean up later sys.path.append("/home/zly/project/merge/macro_split") def test_imports(): from src.ring_numbering import get_macrolactone_numbering from utils.mole_predictor import ParallelBroadSpectrumPredictor assert True ``` **Step 3: Run test** `pixi run pytest tests/test_env_integration.py` --- ### Task 2: Scaffold Preparation (The "Socket") **Files:** - Modify: `src/splicing/scaffold_prep.py` - Test: `tests/test_scaffold_prep.py` **Step 1: Write failing test** Test that `prepare_tylosin_scaffold` returns a molecule with dummy atoms at positions 7, 15, and 16. ```python # tests/test_scaffold_prep.py from rdkit import Chem from src.splicing.scaffold_prep import prepare_tylosin_scaffold TYLOSIN_SMILES = "CCC1OC(=O)C(C)C(O)C(C)C(O)C(C)C(OC2CC(C)(O)C(O)C(C)O2)CC(C)C(=O)C=CC=C1COC3OS(C)C(O)C(N(C)C)C3O" # Simplified/Example def test_scaffold_prep(): scaffold, mapping = prepare_tylosin_scaffold(TYLOSIN_SMILES, positions=[7, 15, 16]) # Check if we have mapped atoms assert 7 in mapping assert 15 in mapping assert 16 in mapping # Check if they are dummy atoms or have specific isotopes ``` **Step 2: Implement `prepare_tylosin_scaffold`** Use `get_macrolactone_numbering` to find the atom indices. Use `RWMol` to replace side chains at those indices with a dummy atom (e.g., At number 0 or Isotope). **Step 3: Run tests** `pixi run pytest tests/test_scaffold_prep.py` --- ### Task 3: Fragment Activation (The "Plug") **Files:** - Create: `src/splicing/fragment_prep.py` - Test: `tests/test_fragment_prep.py` **Step 1: Write failing test** Test that `activate_fragment` takes a SMILES and returns a molecule with *one* attachment point. ```python # tests/test_fragment_prep.py from src.splicing.fragment_prep import activate_fragment def test_activate_fragment_smart(): # Fragment with -OH frag_smiles = "CCO" activated = activate_fragment(frag_smiles, strategy="smart") # Should find the O and replace H with attachment point assert "*" in Chem.MolToSmiles(activated) def test_activate_fragment_random(): frag_smiles = "CCCCC" activated = activate_fragment(frag_smiles, strategy="random") assert "*" in Chem.MolToSmiles(activated) ``` **Step 2: Implement `activate_fragment`** - **Smart**: Look for -NH2, -OH, -SH. Use SMARTS to find them, replace a H with `*`. - **Random**: Pick a random Carbon, replace a H with `*`. **Step 3: Run tests** `pixi run pytest tests/test_fragment_prep.py` --- ### Task 4: Splicing Engine (The Assembly) **Files:** - Create: `src/splicing/engine.py` - Test: `tests/test_splicing_engine.py` **Step 1: Write failing test** Test connecting an activated fragment to the scaffold. ```python def test_splice_molecules(): scaffold = ... # prepared scaffold fragment = ... # activated fragment product = splice_molecule(scaffold, fragment, position=7) assert product is not None assert Chem.MolToSmiles(product) != Chem.MolToSmiles(scaffold) ``` **Step 2: Implement `splice_molecule`** Use `Chem.ReplaceSubstructs` or `Chem.rdChemReactions`. Ensure the connection is chemically valid. **Step 3: Run tests** `pixi run pytest tests/test_splicing_engine.py` --- ### Task 5: Prediction Pipeline Integration **Files:** - Create: `src/splicing/pipeline.py` - Test: `tests/test_pipeline.py` **Step 1: Write failing test (Mocked)** Mock the SIME predictor to avoid loading heavy models during unit tests. ```python def test_pipeline_flow(mocker): # Mock predictor mocker.patch('utils.mole_predictor.ParallelBroadSpectrumPredictor') frags = ["CCO", "CCN"] results = run_splicing_pipeline(TYLOSIN_SMILES, frags, positions=[7]) assert len(results) > 0 ``` **Step 2: Implement `run_splicing_pipeline`** 1. Prep scaffold. 2. Loop fragments -> activate -> splice. 3. Batch generate SMILES. 4. Call `ParallelBroadSpectrumPredictor`. 5. Return results. **Step 3: Run tests** --- ### Task 6: CLI and Final Execution **Files:** - Create: `scripts/run_tylosin_optimization.py` **Step 1: Implement CLI** Arguments: `--input-scaffold`, `--fragment-csv`, `--positions`, `--output`. **Step 2: Integration Test** Run with a small subset of the fragment CSV (head -n 10).