From a768d26e47e4ec7870388377bf1b40533dbe1d01 Mon Sep 17 00:00:00 2001 From: hotwa Date: Wed, 18 Mar 2026 17:56:03 +0800 Subject: [PATCH] Move project docs to docs/project-docs and update references - Move AGENTS.md, CLEANUP_SUMMARY.md, DOCUMENTATION_GUIDE.md, IMPLEMENTATION_SUMMARY.md, QUICK_COMMANDS.md to docs/project-docs/ - Update AGENTS.md to include splicing module documentation - Update mkdocs.yml navigation to include project-docs section - Update .gitignore to track docs/ directory - Add docs/plans/ splicing design documents Co-Authored-By: Claude Opus 4.6 --- .gitignore | 2 +- docs/SUMMARY.md | 243 ++++++++++++++++++ .../2026-01-23-tylosin-splicing-design.md | 95 +++++++ ...6-01-23-tylosin-splicing-implementation.md | 183 +++++++++++++ AGENTS.md => docs/project-docs/AGENTS.md | 22 +- .../project-docs/CLEANUP_SUMMARY.md | 0 .../project-docs/DOCUMENTATION_GUIDE.md | 0 .../project-docs/IMPLEMENTATION_SUMMARY.md | 0 .../project-docs/QUICK_COMMANDS.md | 0 mkdocs.yml | 17 +- 10 files changed, 555 insertions(+), 7 deletions(-) create mode 100644 docs/SUMMARY.md create mode 100644 docs/plans/2026-01-23-tylosin-splicing-design.md create mode 100644 docs/plans/2026-01-23-tylosin-splicing-implementation.md rename AGENTS.md => docs/project-docs/AGENTS.md (87%) rename CLEANUP_SUMMARY.md => docs/project-docs/CLEANUP_SUMMARY.md (100%) rename DOCUMENTATION_GUIDE.md => docs/project-docs/DOCUMENTATION_GUIDE.md (100%) rename IMPLEMENTATION_SUMMARY.md => docs/project-docs/IMPLEMENTATION_SUMMARY.md (100%) rename QUICK_COMMANDS.md => docs/project-docs/QUICK_COMMANDS.md (100%) diff --git a/.gitignore b/.gitignore index 55e98f6..61db48f 100644 --- a/.gitignore +++ b/.gitignore @@ -66,4 +66,4 @@ data/ *.png output/ site/ -docs/ \ No newline at end of file +# docs/ source files should be tracked, only ignore generated site/ \ No newline at end of file diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md new file mode 100644 index 0000000..987c6e7 --- /dev/null +++ b/docs/SUMMARY.md @@ -0,0 +1,243 @@ +# Macro Split 项目文档总结 + +本文档汇总了仓库中所有 Markdown 文件的内容摘要。 + +--- + +## 1. README.md (项目主文档) + +**位置**: `/README.md` + +### 项目简介 +Macrolactone Fragmenter 是一个专业的大环内酯(12-20元环)侧链断裂和分析工具。 + +### 主要特性 +- **智能环原子编号** - 支持 12-20 元环,基于内酯结构的固定编号系统 +- **自动侧链断裂** - 智能识别并断裂所有侧链 +- **强大的可视化** - SVG + PNG 输出 +- **多种导出格式** - JSON、CSV、DataFrame +- **批量处理** - 支持 2000+ 分子的大规模分析 + +### 安装方式 +```bash +# 使用 Pixi(推荐) +pixi install && pixi shell + +# 使用 Pip +conda install -c conda-forge rdkit +pip install -e . +``` + +### 基本用法 +```python +from src.macrolactone_fragmenter import MacrolactoneFragmenter +fragmenter = MacrolactoneFragmenter(ring_size=16) +result = fragmenter.process_molecule(smiles, parent_id="mol_001") +``` + +--- + +## 2. CLEANUP_SUMMARY.md (清理总结) + +**位置**: `/CLEANUP_SUMMARY.md` + +### 内容概要 +记录了项目根目录的清理工作: +- **保留的文件**: README.md, DOCUMENTATION_GUIDE.md, QUICK_COMMANDS.md +- **归档的文件**: 14 个历史文档已移至 `archive/` 目录 +- **清理前**: 17 个 MD 文件,约 120KB +- **清理后**: 3 个核心 MD 文件 + 30+ 个文档系统文件 + +--- + +## 3. DOCUMENTATION_GUIDE.md (文档系统指南) + +**位置**: `/DOCUMENTATION_GUIDE.md` + +### 文档系统特性 +- 使用 **MkDocs + Material 主题 + mkdocstrings** 构建 +- 支持中文、深色/浅色模式 +- 自动从代码生成 API 文档 +- 支持数学公式(MathJax) + +### 常用命令 +```bash +# 本地预览 +pixi run mkdocs serve + +# 构建静态网站 +pixi run mkdocs build + +# 部署到 GitHub Pages +pixi run mkdocs gh-deploy +``` + +### 添加新文档步骤 +1. 在 `docs/` 创建 `.md` 文件 +2. 编辑内容 +3. 在 `mkdocs.yml` 的 `nav` 部分添加链接 +4. 运行预览验证 + +--- + +## 4. IMPLEMENTATION_SUMMARY.md (实现总结) + +**位置**: `/IMPLEMENTATION_SUMMARY.md` + +### MacroLactoneAnalyzer 封装 +新增 `src/macro_lactone_analyzer.py` 模块,提供: + +#### 静态方法 +- `detect_ring_sizes(mol)` - 识别环大小 +- `is_valid_macrolactone(mol, size)` - 验证大环内酯 +- `analyze_smiles(smiles)` - 单分子分析 +- `dynamic_smarts_match(smiles, ring_size)` - 动态 SMARTS 匹配 + +#### 实例方法 +- `get_single_ring_info(smiles)` - 单分子详细信息 +- `analyze_list(smiles_list)` - 批量分析 +- `classify_molecules(df)` - DataFrame 分类 + +### 特性 +- 高复用性、类型安全、详细错误处理 +- 支持 12-20 元环分析 +- 版本号更新至 2.0.0 + +--- + +## 5. QUICK_COMMANDS.md (快速命令参考) + +**位置**: `/QUICK_COMMANDS.md` + +### 文档命令 +```bash +pixi run mkdocs serve # 启动文档服务器 +pixi run mkdocs build # 构建静态文档 +pixi run mkdocs gh-deploy # 部署到 GitHub Pages +``` + +### 安装命令 +```bash +pixi install && pixi shell # Pixi 方式 +pip install -e . # 开发模式 +``` + +### 开发工具 +```bash +pixi run black src/ # 格式化代码 +pixi run flake8 src/ # 检查代码质量 +pixi run pytest # 运行测试 +``` + +--- + +## 6. notebooks/README_analyze_ring16.md (Notebook 说明) + +**位置**: `/notebooks/README_analyze_ring16.md` + +### 文件说明 +- **Notebook**: `analyze_ring16_molecules.ipynb` +- **输入**: `../output/ring16_match_smarts.csv` (307个分子) + +### 分析内容 +1. **分子基本性质**: 分子量、LogP、QED、TPSA 等 +2. **侧链断裂分析**: 使用 MacrolactoneFragmenter 类 +3. **分布图绘制**: 4x4 子图布局,位置 3-16 的分布 + +### 输出文件 +- `ring16_molecular_properties_distribution.png` +- `atom_count_distribution_ring16.png` +- `molecular_weight_distribution_ring16.png` +- `ring16_fragments_analysis.csv` + +### 延伸分析建议 +- LogP/QED/TPSA 分析 +- SAR 分析(如有活性数据) +- 碎片多样性分析 +- 聚类分析 + +--- + +## 7. scripts/README.md (脚本使用说明) + +**位置**: `/scripts/README.md` + +### 脚本列表 + +#### batch_process_ring16.py +- 处理 16 元环分子(1241个) +- 输入: `ring16/temp_filtered_complete.csv` +- 输出: `output/ring16_fragments/` + +#### batch_process_multi_rings.py +- 处理 12-20 元环的所有分子 +- 自动按环大小分类 +- 检测并剔除含多个内酯键的分子 + +### 输出文件格式 +```json +{ + "parent_id": "ring16_mol_0", + "parent_smiles": "...", + "fragments": [ + { + "fragment_smiles": "CC(C)C", + "cleavage_position": 5, + "atom_count": 4, + "molecular_weight": 58.12 + } + ] +} +``` + +### 日志文件 +- `processing_log_*.txt` - 处理过程 +- `error_log_*.txt` - 错误记录 +- `multiple_lactone_log_*.txt` - 多内酯键分子 + +--- + +## 项目结构概览 + +``` +macro_split/ +├── src/ # 核心源代码 +│ ├── macrolactone_fragmenter.py # 高级封装类 +│ ├── macro_lactone_analyzer.py # 环数分析器 +│ ├── ring_numbering.py # 环编号系统 +│ ├── ring_visualization.py # 可视化工具 +│ └── fragment_dataclass.py # 碎片数据类 +├── notebooks/ # Jupyter Notebook 示例 +├── scripts/ # 批量处理脚本 +├── docs/ # 文档目录 +├── tests/ # 单元测试 +├── pyproject.toml # 项目配置 +├── setup.py # 打包脚本 +├── pixi.toml # Pixi 环境配置 +└── mkdocs.yml # 文档配置 +``` + +--- + +## 快速开始 + +1. **安装环境** + ```bash + pixi install && pixi shell + ``` + +2. **测试导入** + ```python + from src.macrolactone_fragmenter import MacrolactoneFragmenter + fragmenter = MacrolactoneFragmenter(ring_size=16) + ``` + +3. **查看文档** + ```bash + pixi run mkdocs serve + # 访问 http://localhost:8000 + ``` + +--- + +*文档生成日期: 2025-01-23* diff --git a/docs/plans/2026-01-23-tylosin-splicing-design.md b/docs/plans/2026-01-23-tylosin-splicing-design.md new file mode 100644 index 0000000..ca4a4ea --- /dev/null +++ b/docs/plans/2026-01-23-tylosin-splicing-design.md @@ -0,0 +1,95 @@ +# Tylosin High-Throughput Splicing & Screening System Design + +## 1. System Overview + +The **Tylosin Splicer** is a combinatorial chemistry engine designed to optimize the Tylosin scaffold. It systematically modifies positions 7, 15, and 16 of the macrolactone ring by splicing high-potential fragments identified by the SIME platform, then immediately evaluating their predicted antibacterial activity. + +## 2. Component Architecture + +```mermaid +componentDiagram + package "Inputs" { + [Tylosin SMILES] as InputCore + [Fragment CSVs] as InputFrags + note right of InputFrags: SIME predicted\nhigh-activity fragments + } + + package "Core Preparation" { + [Scaffold Preparer] as CorePrep + [Ring Numbering] as RingNum + note right of CorePrep: Identifies 7, 15, 16\nReplaces groups with anchors + } + + package "Fragment Processing" { + [Fragment Loader] as FragLoad + [Attachment Point Selector] as AttachSel + note right of AttachSel: Heuristic rules to\nfind connection points + } + + package "Splicing Engine" { + [Combinatorial Splicer] as Splicer + [Conformer Validator] as Validator + note right of Splicer: RDKit ChemicalReaction\nor ReplaceSubstructs + } + + package "Evaluation (SIME)" { + [Activity Predictor] as Predictor + [Broad Spectrum Model] as Model + } + + package "Outputs" { + [Ranked Results CSV] as Output + } + + InputCore --> CorePrep + RingNum -.-> CorePrep : "Locate positions" + + InputFrags --> FragLoad + FragLoad --> AttachSel + + CorePrep --> Splicer : "Scaffold with Anchors (*)" + AttachSel --> Splicer : "Activated Fragments (R-Groups)" + + Splicer --> Validator : "Raw Candidates" + Validator --> Predictor : "Valid 3D Structures" + + Predictor --> Model : "Inference" + Model --> Output : "Scores & Rankings" +``` + +## 3. Data Flow Strategy + +### Step 1: Scaffold Preparation (`CorePrep`) +- **Input**: Tylosin SMILES. +- **Action**: + 1. Parse SMILES using `macro_split` utils. + 2. Use `RingNumbering` to identify atoms at indices 7, 15, 16. + 3. Perform "surgical removal": Break bonds to existing side chains at these indices. + 4. Attach "Anchor Atoms" (Isotopes or Dummy Atoms `[*:1]`, `[*:2]`, `[*:3]`) to the ring carbons. + +### Step 2: Fragment Activation (`AttachSel`) +- **Input**: Fragment SMILES from SIME CSVs. +- **Action**: Convert a standalone molecule into a substituent (R-Group). + - **Strategy A (Smart)**: Identify heteroatoms (-NH2, -OH) as attachment points. + - **Strategy B (Random)**: Randomly replace a Hydrogen with an attachment point. + - **Strategy C (Linker)**: Add a small linker (e.g., -CH2-) if needed. + +### Step 3: Combinatorial Splicing (`Splicer`) +- **Input**: 1 Scaffold + N Fragments. +- **Action**: + - **Single Point**: Modify only pos 7, or 15, or 16. + - **Multi Point**: Combinatorial modification (e.g., 7+15). + - **Reaction**: use `rdkit.Chem.rdChemReactions` or `ReplaceSubstructs`. + +### Step 4: High-Throughput Prediction (`Predictor`) +- **Integration**: Import `SIME.utils.mole_predictor`. +- **Batching**: Collect valid spliced molecules into batches of 128/256. +- **Scoring**: Run `ParallelBroadSpectrumPredictor`. +- **Filtering**: Keep only molecules with `broad_spectrum == True` or high inhibition scores. + +## 4. Technology Stack +- **Core Logic**: Python 3.9+ +- **Chemistry Engine**: RDKit +- **Data Handling**: Pandas, NumPy +- **ML Inference**: PyTorch (via SIME models) +- **Parallelization**: Python `multiprocessing` (via SIME batch predictor) diff --git a/docs/plans/2026-01-23-tylosin-splicing-implementation.md b/docs/plans/2026-01-23-tylosin-splicing-implementation.md new file mode 100644 index 0000000..729cc75 --- /dev/null +++ b/docs/plans/2026-01-23-tylosin-splicing-implementation.md @@ -0,0 +1,183 @@ +# Tylosin Splicing System Implementation Plan + +> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. + +**Goal:** Build a pipeline to splice SIME-identified fragments onto the Tylosin scaffold at positions 7, 15, and 16, and predict their antibacterial activity. + +**Architecture:** A Python-based ETL pipeline using RDKit for structural manipulation (`macro_split`) and PyTorch for activity prediction (`SIME`). + +**Tech Stack:** Python, RDKit, Pandas, PyTorch (SIME), Pytest. + +--- + +### Task 1: Environment & Project Structure Setup + +**Files:** +- Create: `scripts/tylosin_splicer.py` (Main entry point stub) +- Create: `src/splicing/__init__.py` +- Create: `src/splicing/scaffold_prep.py` +- Create: `tests/test_splicing.py` + +**Step 1: Create directory structure** +```bash +mkdir -p src/splicing +touch src/splicing/__init__.py +``` + +**Step 2: Create a basic test to verify environment** +Write a test that imports both `macro_split` and `SIME` modules to ensure the workspace handles imports correctly. + +```python +# tests/test_env_integration.py +import sys +import os +sys.path.append("/home/zly/project/SIME") # Hack for now, will clean up later +sys.path.append("/home/zly/project/merge/macro_split") + +def test_imports(): + from src.ring_numbering import get_macrolactone_numbering + from utils.mole_predictor import ParallelBroadSpectrumPredictor + assert True +``` + +**Step 3: Run test** +`pixi run pytest tests/test_env_integration.py` + +--- + +### Task 2: Scaffold Preparation (The "Socket") + +**Files:** +- Modify: `src/splicing/scaffold_prep.py` +- Test: `tests/test_scaffold_prep.py` + +**Step 1: Write failing test** +Test that `prepare_tylosin_scaffold` returns a molecule with dummy atoms at positions 7, 15, and 16. + +```python +# tests/test_scaffold_prep.py +from rdkit import Chem +from src.splicing.scaffold_prep import prepare_tylosin_scaffold + +TYLOSIN_SMILES = "CCC1OC(=O)C(C)C(O)C(C)C(O)C(C)C(OC2CC(C)(O)C(O)C(C)O2)CC(C)C(=O)C=CC=C1COC3OS(C)C(O)C(N(C)C)C3O" # Simplified/Example + +def test_scaffold_prep(): + scaffold, mapping = prepare_tylosin_scaffold(TYLOSIN_SMILES, positions=[7, 15, 16]) + # Check if we have mapped atoms + assert 7 in mapping + assert 15 in mapping + assert 16 in mapping + # Check if they are dummy atoms or have specific isotopes +``` + +**Step 2: Implement `prepare_tylosin_scaffold`** +Use `get_macrolactone_numbering` to find the atom indices. +Use `RWMol` to replace side chains at those indices with a dummy atom (e.g., At number 0 or Isotope). + +**Step 3: Run tests** +`pixi run pytest tests/test_scaffold_prep.py` + +--- + +### Task 3: Fragment Activation (The "Plug") + +**Files:** +- Create: `src/splicing/fragment_prep.py` +- Test: `tests/test_fragment_prep.py` + +**Step 1: Write failing test** +Test that `activate_fragment` takes a SMILES and returns a molecule with *one* attachment point. + +```python +# tests/test_fragment_prep.py +from src.splicing.fragment_prep import activate_fragment + +def test_activate_fragment_smart(): + # Fragment with -OH + frag_smiles = "CCO" + activated = activate_fragment(frag_smiles, strategy="smart") + # Should find the O and replace H with attachment point + assert "*" in Chem.MolToSmiles(activated) + +def test_activate_fragment_random(): + frag_smiles = "CCCCC" + activated = activate_fragment(frag_smiles, strategy="random") + assert "*" in Chem.MolToSmiles(activated) +``` + +**Step 2: Implement `activate_fragment`** +- **Smart**: Look for -NH2, -OH, -SH. Use SMARTS to find them, replace a H with `*`. +- **Random**: Pick a random Carbon, replace a H with `*`. + +**Step 3: Run tests** +`pixi run pytest tests/test_fragment_prep.py` + +--- + +### Task 4: Splicing Engine (The Assembly) + +**Files:** +- Create: `src/splicing/engine.py` +- Test: `tests/test_splicing_engine.py` + +**Step 1: Write failing test** +Test connecting an activated fragment to the scaffold. + +```python +def test_splice_molecules(): + scaffold = ... # prepared scaffold + fragment = ... # activated fragment + product = splice_molecule(scaffold, fragment, position=7) + assert product is not None + assert Chem.MolToSmiles(product) != Chem.MolToSmiles(scaffold) +``` + +**Step 2: Implement `splice_molecule`** +Use `Chem.ReplaceSubstructs` or `Chem.rdChemReactions`. +Ensure the connection is chemically valid. + +**Step 3: Run tests** +`pixi run pytest tests/test_splicing_engine.py` + +--- + +### Task 5: Prediction Pipeline Integration + +**Files:** +- Create: `src/splicing/pipeline.py` +- Test: `tests/test_pipeline.py` + +**Step 1: Write failing test (Mocked)** +Mock the SIME predictor to avoid loading heavy models during unit tests. + +```python +def test_pipeline_flow(mocker): + # Mock predictor + mocker.patch('utils.mole_predictor.ParallelBroadSpectrumPredictor') + + frags = ["CCO", "CCN"] + results = run_splicing_pipeline(TYLOSIN_SMILES, frags, positions=[7]) + assert len(results) > 0 +``` + +**Step 2: Implement `run_splicing_pipeline`** +1. Prep scaffold. +2. Loop fragments -> activate -> splice. +3. Batch generate SMILES. +4. Call `ParallelBroadSpectrumPredictor`. +5. Return results. + +**Step 3: Run tests** + +--- + +### Task 6: CLI and Final Execution + +**Files:** +- Create: `scripts/run_tylosin_optimization.py` + +**Step 1: Implement CLI** +Arguments: `--input-scaffold`, `--fragment-csv`, `--positions`, `--output`. + +**Step 2: Integration Test** +Run with a small subset of the fragment CSV (head -n 10). diff --git a/AGENTS.md b/docs/project-docs/AGENTS.md similarity index 87% rename from AGENTS.md rename to docs/project-docs/AGENTS.md index 773b058..2432209 100644 --- a/AGENTS.md +++ b/docs/project-docs/AGENTS.md @@ -37,7 +37,11 @@ macro_split/ │ ├── ring_visualization.py # 可视化工具 │ ├── fragment_cleaver.py # 侧链断裂逻辑 │ ├── fragment_dataclass.py # 碎片数据类 -│ └── visualizer.py # 统计可视化 +│ ├── visualizer.py # 统计可视化 +│ └── splicing/ # 分子拼接模块 +│ ├── engine.py # 拼接引擎 +│ ├── scaffold_prep.py # 骨架准备 +│ └── fragment_prep.py # 片段激活 ├── notebooks/ # Jupyter Notebook 示例 ├── scripts/ # 批量处理脚本 ├── tests/ # 单元测试 @@ -65,6 +69,22 @@ analyzer = MacroLactoneAnalyzer() info = analyzer.get_single_ring_info(smiles) ``` +### Splicing 模块 (分子拼接) +```python +from src.splicing.scaffold_prep import prepare_tylosin_scaffold +from src.splicing.fragment_prep import activate_fragment +from src.splicing.engine import splice_molecule + +# 准备骨架(移除侧链,标记dummy原子) +scaffold, dummy_map = prepare_tylosin_scaffold(smiles, positions=[3, 5, 9]) + +# 激活片段(添加连接点) +fragment = activate_fragment(fragment_smiles, strategy="smart") + +# 拼接分子 +new_mol = splice_molecule(scaffold, fragment, position=3) +``` + ### 数据类结构 ```python @dataclass diff --git a/CLEANUP_SUMMARY.md b/docs/project-docs/CLEANUP_SUMMARY.md similarity index 100% rename from CLEANUP_SUMMARY.md rename to docs/project-docs/CLEANUP_SUMMARY.md diff --git a/DOCUMENTATION_GUIDE.md b/docs/project-docs/DOCUMENTATION_GUIDE.md similarity index 100% rename from DOCUMENTATION_GUIDE.md rename to docs/project-docs/DOCUMENTATION_GUIDE.md diff --git a/IMPLEMENTATION_SUMMARY.md b/docs/project-docs/IMPLEMENTATION_SUMMARY.md similarity index 100% rename from IMPLEMENTATION_SUMMARY.md rename to docs/project-docs/IMPLEMENTATION_SUMMARY.md diff --git a/QUICK_COMMANDS.md b/docs/project-docs/QUICK_COMMANDS.md similarity index 100% rename from QUICK_COMMANDS.md rename to docs/project-docs/QUICK_COMMANDS.md diff --git a/mkdocs.yml b/mkdocs.yml index 0848b95..c81b50a 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -131,7 +131,7 @@ nav: - index.md - 快速开始: getting-started.md - 安装指南: installation.md - + - 用户指南: - user-guide/index.md - MacrolactoneFragmenter 使用: user-guide/fragmenter-usage.md @@ -139,14 +139,14 @@ nav: - 可视化功能: user-guide/visualization.md - 批量处理: user-guide/batch-processing.md - 数据导出: user-guide/data-export.md - + - 教程与示例: - tutorials/index.md - 基础教程: tutorials/basic-tutorial.md - 环数识别教程: tutorials/using-macro-lactone-analyzer.md - 高级用法: tutorials/advanced-usage.md - 使用案例: tutorials/use-cases.md - + - API 参考: - api/index.md - MacroLactoneAnalyzer: api/macro-lactone-analyzer.md @@ -155,13 +155,20 @@ nav: - 环编号模块: api/ring-numbering.md - 可视化模块: api/ring-visualization.md - 工具函数: api/utilities.md - + - 开发者指南: - development/index.md - 贡献指南: development/contributing.md - 项目结构: development/project-structure.md - 测试: development/testing.md - + + - 项目文档: + - project-docs/AGENTS.md + - 实现总结: project-docs/IMPLEMENTATION_SUMMARY.md + - 清理总结: project-docs/CLEANUP_SUMMARY.md + - 文档指南: project-docs/DOCUMENTATION_GUIDE.md + - 快速命令: project-docs/QUICK_COMMANDS.md + - 关于: - about/index.md - 更新日志: about/changelog.md