Move project docs to docs/project-docs and update references

- Move AGENTS.md, CLEANUP_SUMMARY.md, DOCUMENTATION_GUIDE.md, IMPLEMENTATION_SUMMARY.md, QUICK_COMMANDS.md to docs/project-docs/ - Update AGENTS.md to include splicing module documentation - Update mkdocs.yml navigation to include project-docs section - Update .gitignore to track docs/ directory - Add docs/plans/ splicing design documents Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 17:56:03 +08:00
parent 68f171ad1d
commit a768d26e47
10 changed files with 555 additions and 7 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -66,4 +66,4 @@ data/
 *.png
 output/
 site/
-docs/
+# docs/ source files should be tracked, only ignore generated site/
--- a/docs/SUMMARY.md
+++ b/docs/SUMMARY.md
@@ -0,0 +1,243 @@
+# Macro Split 项目文档总结
+
+本文档汇总了仓库中所有 Markdown 文件的内容摘要。
+
+---
+
+## 1. README.md (项目主文档)
+
+**位置**: `/README.md`
+
+### 项目简介
+Macrolactone Fragmenter 是一个专业的大环内酯（12-20元环）侧链断裂和分析工具。
+
+### 主要特性
+- **智能环原子编号** - 支持 12-20 元环，基于内酯结构的固定编号系统
+- **自动侧链断裂** - 智能识别并断裂所有侧链
+- **强大的可视化** - SVG + PNG 输出
+- **多种导出格式** - JSON、CSV、DataFrame
+- **批量处理** - 支持 2000+ 分子的大规模分析
+
+### 安装方式
+```bash
+# 使用 Pixi（推荐）
+pixi install && pixi shell
+
+# 使用 Pip
+conda install -c conda-forge rdkit
+pip install -e .
+```
+
+### 基本用法
+```python
+from src.macrolactone_fragmenter import MacrolactoneFragmenter
+fragmenter = MacrolactoneFragmenter(ring_size=16)
+result = fragmenter.process_molecule(smiles, parent_id="mol_001")
+```
+
+---
+
+## 2. CLEANUP_SUMMARY.md (清理总结)
+
+**位置**: `/CLEANUP_SUMMARY.md`
+
+### 内容概要
+记录了项目根目录的清理工作：
+- **保留的文件**: README.md, DOCUMENTATION_GUIDE.md, QUICK_COMMANDS.md
+- **归档的文件**: 14 个历史文档已移至 `archive/` 目录
+- **清理前**: 17 个 MD 文件，约 120KB
+- **清理后**: 3 个核心 MD 文件 + 30+ 个文档系统文件
+
+---
+
+## 3. DOCUMENTATION_GUIDE.md (文档系统指南)
+
+**位置**: `/DOCUMENTATION_GUIDE.md`
+
+### 文档系统特性
+- 使用 **MkDocs + Material 主题 + mkdocstrings** 构建
+- 支持中文、深色/浅色模式
+- 自动从代码生成 API 文档
+- 支持数学公式（MathJax）
+
+### 常用命令
+```bash
+# 本地预览
+pixi run mkdocs serve
+
+# 构建静态网站
+pixi run mkdocs build
+
+# 部署到 GitHub Pages
+pixi run mkdocs gh-deploy
+```
+
+### 添加新文档步骤
+1. 在 `docs/` 创建 `.md` 文件
+2. 编辑内容
+3. 在 `mkdocs.yml` 的 `nav` 部分添加链接
+4. 运行预览验证
+
+---
+
+## 4. IMPLEMENTATION_SUMMARY.md (实现总结)
+
+**位置**: `/IMPLEMENTATION_SUMMARY.md`
+
+### MacroLactoneAnalyzer 封装
+新增 `src/macro_lactone_analyzer.py` 模块，提供：
+
+#### 静态方法
+- `detect_ring_sizes(mol)` - 识别环大小
+- `is_valid_macrolactone(mol, size)` - 验证大环内酯
+- `analyze_smiles(smiles)` - 单分子分析
+- `dynamic_smarts_match(smiles, ring_size)` - 动态 SMARTS 匹配
+
+#### 实例方法
+- `get_single_ring_info(smiles)` - 单分子详细信息
+- `analyze_list(smiles_list)` - 批量分析
+- `classify_molecules(df)` - DataFrame 分类
+
+### 特性
+- 高复用性、类型安全、详细错误处理
+- 支持 12-20 元环分析
+- 版本号更新至 2.0.0
+
+---
+
+## 5. QUICK_COMMANDS.md (快速命令参考)
+
+**位置**: `/QUICK_COMMANDS.md`
+
+### 文档命令
+```bash
+pixi run mkdocs serve      # 启动文档服务器
+pixi run mkdocs build      # 构建静态文档
+pixi run mkdocs gh-deploy  # 部署到 GitHub Pages
+```
+
+### 安装命令
+```bash
+pixi install && pixi shell  # Pixi 方式
+pip install -e .            # 开发模式
+```
+
+### 开发工具
+```bash
+pixi run black src/         # 格式化代码
+pixi run flake8 src/        # 检查代码质量
+pixi run pytest             # 运行测试
+```
+
+---
+
+## 6. notebooks/README_analyze_ring16.md (Notebook 说明)
+
+**位置**: `/notebooks/README_analyze_ring16.md`
+
+### 文件说明
+- **Notebook**: `analyze_ring16_molecules.ipynb`
+- **输入**: `../output/ring16_match_smarts.csv` (307个分子)
+
+### 分析内容
+1. **分子基本性质**: 分子量、LogP、QED、TPSA 等
+2. **侧链断裂分析**: 使用 MacrolactoneFragmenter 类
+3. **分布图绘制**: 4x4 子图布局，位置 3-16 的分布
+
+### 输出文件
+- `ring16_molecular_properties_distribution.png`
+- `atom_count_distribution_ring16.png`
+- `molecular_weight_distribution_ring16.png`
+- `ring16_fragments_analysis.csv`
+
+### 延伸分析建议
+- LogP/QED/TPSA 分析
+- SAR 分析（如有活性数据）
+- 碎片多样性分析
+- 聚类分析
+
+---
+
+## 7. scripts/README.md (脚本使用说明)
+
+**位置**: `/scripts/README.md`
+
+### 脚本列表
+
+#### batch_process_ring16.py
+- 处理 16 元环分子（1241个）
+- 输入: `ring16/temp_filtered_complete.csv`
+- 输出: `output/ring16_fragments/`
+
+#### batch_process_multi_rings.py
+- 处理 12-20 元环的所有分子
+- 自动按环大小分类
+- 检测并剔除含多个内酯键的分子
+
+### 输出文件格式
+```json
+{
+  "parent_id": "ring16_mol_0",
+  "parent_smiles": "...",
+  "fragments": [
+    {
+      "fragment_smiles": "CC(C)C",
+      "cleavage_position": 5,
+      "atom_count": 4,
+      "molecular_weight": 58.12
+    }
+  ]
+}
+```
+
+### 日志文件
+- `processing_log_*.txt` - 处理过程
+- `error_log_*.txt` - 错误记录
+- `multiple_lactone_log_*.txt` - 多内酯键分子
+
+---
+
+## 项目结构概览
+
+```
+macro_split/
+├── src/                           # 核心源代码
+│   ├── macrolactone_fragmenter.py # 高级封装类
+│   ├── macro_lactone_analyzer.py  # 环数分析器
+│   ├── ring_numbering.py          # 环编号系统
+│   ├── ring_visualization.py      # 可视化工具
+│   └── fragment_dataclass.py      # 碎片数据类
+├── notebooks/                     # Jupyter Notebook 示例
+├── scripts/                       # 批量处理脚本
+├── docs/                          # 文档目录
+├── tests/                         # 单元测试
+├── pyproject.toml                 # 项目配置
+├── setup.py                       # 打包脚本
+├── pixi.toml                      # Pixi 环境配置
+└── mkdocs.yml                     # 文档配置
+```
+
+---
+
+## 快速开始
+
+1. **安装环境**
+   ```bash
+   pixi install && pixi shell
+   ```
+
+2. **测试导入**
+   ```python
+   from src.macrolactone_fragmenter import MacrolactoneFragmenter
+   fragmenter = MacrolactoneFragmenter(ring_size=16)
+   ```
+
+3. **查看文档**
+   ```bash
+   pixi run mkdocs serve
+   # 访问 http://localhost:8000
+   ```
+
+---
+
+*文档生成日期: 2025-01-23*
--- a/docs/plans/2026-01-23-tylosin-splicing-design.md
+++ b/docs/plans/2026-01-23-tylosin-splicing-design.md
@@ -0,0 +1,95 @@
+# Tylosin High-Throughput Splicing & Screening System Design
+
+## 1. System Overview
+
+The **Tylosin Splicer** is a combinatorial chemistry engine designed to optimize the Tylosin scaffold. It systematically modifies positions 7, 15, and 16 of the macrolactone ring by splicing high-potential fragments identified by the SIME platform, then immediately evaluating their predicted antibacterial activity.
+
+## 2. Component Architecture
+
+```mermaid
+componentDiagram
+    package "Inputs" {
+        [Tylosin SMILES] as InputCore
+        [Fragment CSVs] as InputFrags
+        note right of InputFrags: SIME predicted\nhigh-activity fragments
+    }
+
+    package "Core Preparation" {
+        [Scaffold Preparer] as CorePrep
+        [Ring Numbering] as RingNum
+        note right of CorePrep: Identifies 7, 15, 16\nReplaces groups with anchors
+    }
+
+    package "Fragment Processing" {
+        [Fragment Loader] as FragLoad
+        [Attachment Point Selector] as AttachSel
+        note right of AttachSel: Heuristic rules to\nfind connection points
+    }
+
+    package "Splicing Engine" {
+        [Combinatorial Splicer] as Splicer
+        [Conformer Validator] as Validator
+        note right of Splicer: RDKit ChemicalReaction\nor ReplaceSubstructs
+    }
+
+    package "Evaluation (SIME)" {
+        [Activity Predictor] as Predictor
+        [Broad Spectrum Model] as Model
+    }
+
+    package "Outputs" {
+        [Ranked Results CSV] as Output
+    }
+
+    InputCore --> CorePrep
+    RingNum -.-> CorePrep : "Locate positions"
+
+    InputFrags --> FragLoad
+    FragLoad --> AttachSel
+
+    CorePrep --> Splicer : "Scaffold with Anchors (*)"
+    AttachSel --> Splicer : "Activated Fragments (R-Groups)"
+
+    Splicer --> Validator : "Raw Candidates"
+    Validator --> Predictor : "Valid 3D Structures"
+
+    Predictor --> Model : "Inference"
+    Model --> Output : "Scores & Rankings"
+```
+
+## 3. Data Flow Strategy
+
+### Step 1: Scaffold Preparation (`CorePrep`)
+- **Input**: Tylosin SMILES.
+- **Action**:
+    1. Parse SMILES using `macro_split` utils.
+    2. Use `RingNumbering` to identify atoms at indices 7, 15, 16.
+    3. Perform "surgical removal": Break bonds to existing side chains at these indices.
+    4. Attach "Anchor Atoms" (Isotopes or Dummy Atoms `[*:1]`, `[*:2]`, `[*:3]`) to the ring carbons.
+
+### Step 2: Fragment Activation (`AttachSel`)
+- **Input**: Fragment SMILES from SIME CSVs.
+- **Action**: Convert a standalone molecule into a substituent (R-Group).
+    - **Strategy A (Smart)**: Identify heteroatoms (-NH2, -OH) as attachment points.
+    - **Strategy B (Random)**: Randomly replace a Hydrogen with an attachment point.
+    - **Strategy C (Linker)**: Add a small linker (e.g., -CH2-) if needed.
+
+### Step 3: Combinatorial Splicing (`Splicer`)
+- **Input**: 1 Scaffold + N Fragments.
+- **Action**:
+    - **Single Point**: Modify only pos 7, or 15, or 16.
+    - **Multi Point**: Combinatorial modification (e.g., 7+15).
+    - **Reaction**: use `rdkit.Chem.rdChemReactions` or `ReplaceSubstructs`.
+
+### Step 4: High-Throughput Prediction (`Predictor`)
+- **Integration**: Import `SIME.utils.mole_predictor`.
+- **Batching**: Collect valid spliced molecules into batches of 128/256.
+- **Scoring**: Run `ParallelBroadSpectrumPredictor`.
+- **Filtering**: Keep only molecules with `broad_spectrum == True` or high inhibition scores.
+
+## 4. Technology Stack
+- **Core Logic**: Python 3.9+
+- **Chemistry Engine**: RDKit
+- **Data Handling**: Pandas, NumPy
+- **ML Inference**: PyTorch (via SIME models)
+- **Parallelization**: Python `multiprocessing` (via SIME batch predictor)
--- a/docs/plans/2026-01-23-tylosin-splicing-implementation.md
+++ b/docs/plans/2026-01-23-tylosin-splicing-implementation.md
@@ -0,0 +1,183 @@
+# Tylosin Splicing System Implementation Plan
+
+> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
+
+**Goal:** Build a pipeline to splice SIME-identified fragments onto the Tylosin scaffold at positions 7, 15, and 16, and predict their antibacterial activity.
+
+**Architecture:** A Python-based ETL pipeline using RDKit for structural manipulation (`macro_split`) and PyTorch for activity prediction (`SIME`).
+
+**Tech Stack:** Python, RDKit, Pandas, PyTorch (SIME), Pytest.
+
+---
+
+### Task 1: Environment & Project Structure Setup
+
+**Files:**
+- Create: `scripts/tylosin_splicer.py` (Main entry point stub)
+- Create: `src/splicing/__init__.py`
+- Create: `src/splicing/scaffold_prep.py`
+- Create: `tests/test_splicing.py`
+
+**Step 1: Create directory structure**
+```bash
+mkdir -p src/splicing
+touch src/splicing/__init__.py
+```
+
+**Step 2: Create a basic test to verify environment**
+Write a test that imports both `macro_split` and `SIME` modules to ensure the workspace handles imports correctly.
+
+```python
+# tests/test_env_integration.py
+import sys
+import os
+sys.path.append("/home/zly/project/SIME")  # Hack for now, will clean up later
+sys.path.append("/home/zly/project/merge/macro_split")
+
+def test_imports():
+    from src.ring_numbering import get_macrolactone_numbering
+    from utils.mole_predictor import ParallelBroadSpectrumPredictor
+    assert True
+```
+
+**Step 3: Run test**
+`pixi run pytest tests/test_env_integration.py`
+
+---
+
+### Task 2: Scaffold Preparation (The "Socket")
+
+**Files:**
+- Modify: `src/splicing/scaffold_prep.py`
+- Test: `tests/test_scaffold_prep.py`
+
+**Step 1: Write failing test**
+Test that `prepare_tylosin_scaffold` returns a molecule with dummy atoms at positions 7, 15, and 16.
+
+```python
+# tests/test_scaffold_prep.py
+from rdkit import Chem
+from src.splicing.scaffold_prep import prepare_tylosin_scaffold
+
+TYLOSIN_SMILES = "CCC1OC(=O)C(C)C(O)C(C)C(O)C(C)C(OC2CC(C)(O)C(O)C(C)O2)CC(C)C(=O)C=CC=C1COC3OS(C)C(O)C(N(C)C)C3O" # Simplified/Example
+
+def test_scaffold_prep():
+    scaffold, mapping = prepare_tylosin_scaffold(TYLOSIN_SMILES, positions=[7, 15, 16])
+    # Check if we have mapped atoms
+    assert 7 in mapping
+    assert 15 in mapping
+    assert 16 in mapping
+    # Check if they are dummy atoms or have specific isotopes
+```
+
+**Step 2: Implement `prepare_tylosin_scaffold`**
+Use `get_macrolactone_numbering` to find the atom indices.
+Use `RWMol` to replace side chains at those indices with a dummy atom (e.g., At number 0 or Isotope).
+
+**Step 3: Run tests**
+`pixi run pytest tests/test_scaffold_prep.py`
+
+---
+
+### Task 3: Fragment Activation (The "Plug")
+
+**Files:**
+- Create: `src/splicing/fragment_prep.py`
+- Test: `tests/test_fragment_prep.py`
+
+**Step 1: Write failing test**
+Test that `activate_fragment` takes a SMILES and returns a molecule with *one* attachment point.
+
+```python
+# tests/test_fragment_prep.py
+from src.splicing.fragment_prep import activate_fragment
+
+def test_activate_fragment_smart():
+    # Fragment with -OH
+    frag_smiles = "CCO"
+    activated = activate_fragment(frag_smiles, strategy="smart")
+    # Should find the O and replace H with attachment point
+    assert "*" in Chem.MolToSmiles(activated)
+
+def test_activate_fragment_random():
+    frag_smiles = "CCCCC"
+    activated = activate_fragment(frag_smiles, strategy="random")
+    assert "*" in Chem.MolToSmiles(activated)
+```
+
+**Step 2: Implement `activate_fragment`**
+- **Smart**: Look for -NH2, -OH, -SH. Use SMARTS to find them, replace a H with `*`.
+- **Random**: Pick a random Carbon, replace a H with `*`.
+
+**Step 3: Run tests**
+`pixi run pytest tests/test_fragment_prep.py`
+
+---
+
+### Task 4: Splicing Engine (The Assembly)
+
+**Files:**
+- Create: `src/splicing/engine.py`
+- Test: `tests/test_splicing_engine.py`
+
+**Step 1: Write failing test**
+Test connecting an activated fragment to the scaffold.
+
+```python
+def test_splice_molecules():
+    scaffold = ... # prepared scaffold
+    fragment = ... # activated fragment
+    product = splice_molecule(scaffold, fragment, position=7)
+    assert product is not None
+    assert Chem.MolToSmiles(product) != Chem.MolToSmiles(scaffold)
+```
+
+**Step 2: Implement `splice_molecule`**
+Use `Chem.ReplaceSubstructs` or `Chem.rdChemReactions`.
+Ensure the connection is chemically valid.
+
+**Step 3: Run tests**
+`pixi run pytest tests/test_splicing_engine.py`
+
+---
+
+### Task 5: Prediction Pipeline Integration
+
+**Files:**
+- Create: `src/splicing/pipeline.py`
+- Test: `tests/test_pipeline.py`
+
+**Step 1: Write failing test (Mocked)**
+Mock the SIME predictor to avoid loading heavy models during unit tests.
+
+```python
+def test_pipeline_flow(mocker):
+    # Mock predictor
+    mocker.patch('utils.mole_predictor.ParallelBroadSpectrumPredictor')
+
+    frags = ["CCO", "CCN"]
+    results = run_splicing_pipeline(TYLOSIN_SMILES, frags, positions=[7])
+    assert len(results) > 0
+```
+
+**Step 2: Implement `run_splicing_pipeline`**
+1. Prep scaffold.
+2. Loop fragments -> activate -> splice.
+3. Batch generate SMILES.
+4. Call `ParallelBroadSpectrumPredictor`.
+5. Return results.
+
+**Step 3: Run tests**
+
+---
+
+### Task 6: CLI and Final Execution
+
+**Files:**
+- Create: `scripts/run_tylosin_optimization.py`
+
+**Step 1: Implement CLI**
+Arguments: `--input-scaffold`, `--fragment-csv`, `--positions`, `--output`.
+
+**Step 2: Integration Test**
+Run with a small subset of the fragment CSV (head -n 10).
--- a/docs/project-docs/AGENTS.md
+++ b/docs/project-docs/AGENTS.md
@@ -37,7 +37,11 @@ macro_split/
 │   ├── ring_visualization.py      # 可视化工具
 │   ├── fragment_cleaver.py        # 侧链断裂逻辑
 │   ├── fragment_dataclass.py      # 碎片数据类
-│   └── visualizer.py              # 统计可视化
+│   ├── visualizer.py              # 统计可视化
+│   └── splicing/                  # 分子拼接模块
+│       ├── engine.py              # 拼接引擎
+│       ├── scaffold_prep.py       # 骨架准备
+│       └── fragment_prep.py       # 片段激活
 ├── notebooks/                     # Jupyter Notebook 示例
 ├── scripts/                       # 批量处理脚本
 ├── tests/                         # 单元测试
@@ -65,6 +69,22 @@ analyzer = MacroLactoneAnalyzer()
 info = analyzer.get_single_ring_info(smiles)
 ```

+### Splicing 模块 (分子拼接)
+```python
+from src.splicing.scaffold_prep import prepare_tylosin_scaffold
+from src.splicing.fragment_prep import activate_fragment
+from src.splicing.engine import splice_molecule
+
+# 准备骨架（移除侧链，标记dummy原子）
+scaffold, dummy_map = prepare_tylosin_scaffold(smiles, positions=[3, 5, 9])
+
+# 激活片段（添加连接点）
+fragment = activate_fragment(fragment_smiles, strategy="smart")
+
+# 拼接分子
+new_mol = splice_molecule(scaffold, fragment, position=3)
+```
+
 ### 数据类结构
 ```python
@dataclass
--- a/docs/project-docs/CLEANUP_SUMMARY.md
+++ b/docs/project-docs/CLEANUP_SUMMARY.md
--- a/docs/project-docs/DOCUMENTATION_GUIDE.md
+++ b/docs/project-docs/DOCUMENTATION_GUIDE.md
--- a/docs/project-docs/IMPLEMENTATION_SUMMARY.md
+++ b/docs/project-docs/IMPLEMENTATION_SUMMARY.md
--- a/docs/project-docs/QUICK_COMMANDS.md
+++ b/docs/project-docs/QUICK_COMMANDS.md
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -131,7 +131,7 @@ nav:
    - index.md
    - 快速开始: getting-started.md
    - 安装指南: installation.md
-  
+
  - 用户指南:
    - user-guide/index.md
    - MacrolactoneFragmenter 使用: user-guide/fragmenter-usage.md
@@ -139,14 +139,14 @@ nav:
    - 可视化功能: user-guide/visualization.md
    - 批量处理: user-guide/batch-processing.md
    - 数据导出: user-guide/data-export.md
-  
+
  - 教程与示例:
    - tutorials/index.md
    - 基础教程: tutorials/basic-tutorial.md
    - 环数识别教程: tutorials/using-macro-lactone-analyzer.md
    - 高级用法: tutorials/advanced-usage.md
    - 使用案例: tutorials/use-cases.md
-  
+
  - API 参考:
    - api/index.md
    - MacroLactoneAnalyzer: api/macro-lactone-analyzer.md
@@ -155,13 +155,20 @@ nav:
    - 环编号模块: api/ring-numbering.md
    - 可视化模块: api/ring-visualization.md
    - 工具函数: api/utilities.md
-  
+
  - 开发者指南:
    - development/index.md
    - 贡献指南: development/contributing.md
    - 项目结构: development/project-structure.md
    - 测试: development/testing.md
-  
+
+  - 项目文档:
+    - project-docs/AGENTS.md
+    - 实现总结: project-docs/IMPLEMENTATION_SUMMARY.md
+    - 清理总结: project-docs/CLEANUP_SUMMARY.md
+    - 文档指南: project-docs/DOCUMENTATION_GUIDE.md
+    - 快速命令: project-docs/QUICK_COMMANDS.md
+
  - 关于:
    - about/index.md
    - 更新日志: about/changelog.md