Files
macro_split/docs/plans/2026-01-23-tylosin-splicing-implementation.md
hotwa a768d26e47 Move project docs to docs/project-docs and update references
- Move AGENTS.md, CLEANUP_SUMMARY.md, DOCUMENTATION_GUIDE.md,
  IMPLEMENTATION_SUMMARY.md, QUICK_COMMANDS.md to docs/project-docs/
- Update AGENTS.md to include splicing module documentation
- Update mkdocs.yml navigation to include project-docs section
- Update .gitignore to track docs/ directory
- Add docs/plans/ splicing design documents

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 17:56:03 +08:00

5.3 KiB

Tylosin Splicing System Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Build a pipeline to splice SIME-identified fragments onto the Tylosin scaffold at positions 7, 15, and 16, and predict their antibacterial activity.

Architecture: A Python-based ETL pipeline using RDKit for structural manipulation (macro_split) and PyTorch for activity prediction (SIME).

Tech Stack: Python, RDKit, Pandas, PyTorch (SIME), Pytest.


Task 1: Environment & Project Structure Setup

Files:

  • Create: scripts/tylosin_splicer.py (Main entry point stub)
  • Create: src/splicing/__init__.py
  • Create: src/splicing/scaffold_prep.py
  • Create: tests/test_splicing.py

Step 1: Create directory structure

mkdir -p src/splicing
touch src/splicing/__init__.py

Step 2: Create a basic test to verify environment Write a test that imports both macro_split and SIME modules to ensure the workspace handles imports correctly.

# tests/test_env_integration.py
import sys
import os
sys.path.append("/home/zly/project/SIME")  # Hack for now, will clean up later
sys.path.append("/home/zly/project/merge/macro_split")

def test_imports():
    from src.ring_numbering import get_macrolactone_numbering
    from utils.mole_predictor import ParallelBroadSpectrumPredictor
    assert True

Step 3: Run test pixi run pytest tests/test_env_integration.py


Task 2: Scaffold Preparation (The "Socket")

Files:

  • Modify: src/splicing/scaffold_prep.py
  • Test: tests/test_scaffold_prep.py

Step 1: Write failing test Test that prepare_tylosin_scaffold returns a molecule with dummy atoms at positions 7, 15, and 16.

# tests/test_scaffold_prep.py
from rdkit import Chem
from src.splicing.scaffold_prep import prepare_tylosin_scaffold

TYLOSIN_SMILES = "CCC1OC(=O)C(C)C(O)C(C)C(O)C(C)C(OC2CC(C)(O)C(O)C(C)O2)CC(C)C(=O)C=CC=C1COC3OS(C)C(O)C(N(C)C)C3O" # Simplified/Example

def test_scaffold_prep():
    scaffold, mapping = prepare_tylosin_scaffold(TYLOSIN_SMILES, positions=[7, 15, 16])
    # Check if we have mapped atoms
    assert 7 in mapping
    assert 15 in mapping
    assert 16 in mapping
    # Check if they are dummy atoms or have specific isotopes

Step 2: Implement prepare_tylosin_scaffold Use get_macrolactone_numbering to find the atom indices. Use RWMol to replace side chains at those indices with a dummy atom (e.g., At number 0 or Isotope).

Step 3: Run tests pixi run pytest tests/test_scaffold_prep.py


Task 3: Fragment Activation (The "Plug")

Files:

  • Create: src/splicing/fragment_prep.py
  • Test: tests/test_fragment_prep.py

Step 1: Write failing test Test that activate_fragment takes a SMILES and returns a molecule with one attachment point.

# tests/test_fragment_prep.py
from src.splicing.fragment_prep import activate_fragment

def test_activate_fragment_smart():
    # Fragment with -OH
    frag_smiles = "CCO"
    activated = activate_fragment(frag_smiles, strategy="smart")
    # Should find the O and replace H with attachment point
    assert "*" in Chem.MolToSmiles(activated)

def test_activate_fragment_random():
    frag_smiles = "CCCCC"
    activated = activate_fragment(frag_smiles, strategy="random")
    assert "*" in Chem.MolToSmiles(activated)

Step 2: Implement activate_fragment

  • Smart: Look for -NH2, -OH, -SH. Use SMARTS to find them, replace a H with *.
  • Random: Pick a random Carbon, replace a H with *.

Step 3: Run tests pixi run pytest tests/test_fragment_prep.py


Task 4: Splicing Engine (The Assembly)

Files:

  • Create: src/splicing/engine.py
  • Test: tests/test_splicing_engine.py

Step 1: Write failing test Test connecting an activated fragment to the scaffold.

def test_splice_molecules():
    scaffold = ... # prepared scaffold
    fragment = ... # activated fragment
    product = splice_molecule(scaffold, fragment, position=7)
    assert product is not None
    assert Chem.MolToSmiles(product) != Chem.MolToSmiles(scaffold)

Step 2: Implement splice_molecule Use Chem.ReplaceSubstructs or Chem.rdChemReactions. Ensure the connection is chemically valid.

Step 3: Run tests pixi run pytest tests/test_splicing_engine.py


Task 5: Prediction Pipeline Integration

Files:

  • Create: src/splicing/pipeline.py
  • Test: tests/test_pipeline.py

Step 1: Write failing test (Mocked) Mock the SIME predictor to avoid loading heavy models during unit tests.

def test_pipeline_flow(mocker):
    # Mock predictor
    mocker.patch('utils.mole_predictor.ParallelBroadSpectrumPredictor')

    frags = ["CCO", "CCN"]
    results = run_splicing_pipeline(TYLOSIN_SMILES, frags, positions=[7])
    assert len(results) > 0

Step 2: Implement run_splicing_pipeline

  1. Prep scaffold.
  2. Loop fragments -> activate -> splice.
  3. Batch generate SMILES.
  4. Call ParallelBroadSpectrumPredictor.
  5. Return results.

Step 3: Run tests


Task 6: CLI and Final Execution

Files:

  • Create: scripts/run_tylosin_optimization.py

Step 1: Implement CLI Arguments: --input-scaffold, --fragment-csv, --positions, --output.

Step 2: Integration Test Run with a small subset of the fragment CSV (head -n 10).