- Move AGENTS.md, CLEANUP_SUMMARY.md, DOCUMENTATION_GUIDE.md, IMPLEMENTATION_SUMMARY.md, QUICK_COMMANDS.md to docs/project-docs/ - Update AGENTS.md to include splicing module documentation - Update mkdocs.yml navigation to include project-docs section - Update .gitignore to track docs/ directory - Add docs/plans/ splicing design documents Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
276 lines
6.7 KiB
Markdown
276 lines
6.7 KiB
Markdown
# AGENTS.md
|
||
|
||
本文件为 AI 编程助手(如 Claude、Copilot、Cursor 等)提供项目上下文和开发指南。
|
||
|
||
## 项目概述
|
||
|
||
**Macrolactone Fragmenter** 是一个专业的大环内酯(12-20元环)侧链断裂和分析工具,用于化学信息学研究。
|
||
|
||
### 核心功能
|
||
- 智能环原子编号(基于内酯结构)
|
||
- 自动侧链断裂分析
|
||
- 分子可视化(SVG/PNG)
|
||
- 批量处理和数据导出
|
||
|
||
## 技术栈
|
||
|
||
| 组件 | 技术 |
|
||
|------|------|
|
||
| 语言 | Python 3.8+ |
|
||
| 化学库 | RDKit |
|
||
| 数据处理 | Pandas, NumPy |
|
||
| 可视化 | Matplotlib, Seaborn |
|
||
| 环境管理 | Pixi (推荐) / Conda |
|
||
| 文档 | MkDocs + Material |
|
||
| 测试 | Pytest |
|
||
| 代码格式 | Black, Flake8 |
|
||
|
||
## 项目结构
|
||
|
||
```
|
||
macro_split/
|
||
├── src/ # 核心源代码
|
||
│ ├── __init__.py # 包初始化
|
||
│ ├── macrolactone_fragmenter.py # ⭐ 主入口类
|
||
│ ├── macro_lactone_analyzer.py # 环数分析器
|
||
│ ├── ring_numbering.py # 环编号系统
|
||
│ ├── ring_visualization.py # 可视化工具
|
||
│ ├── fragment_cleaver.py # 侧链断裂逻辑
|
||
│ ├── fragment_dataclass.py # 碎片数据类
|
||
│ ├── visualizer.py # 统计可视化
|
||
│ └── splicing/ # 分子拼接模块
|
||
│ ├── engine.py # 拼接引擎
|
||
│ ├── scaffold_prep.py # 骨架准备
|
||
│ └── fragment_prep.py # 片段激活
|
||
├── notebooks/ # Jupyter Notebook 示例
|
||
├── scripts/ # 批量处理脚本
|
||
├── tests/ # 单元测试
|
||
├── docs/ # 文档目录
|
||
├── pyproject.toml # 项目配置
|
||
├── pixi.toml # Pixi 环境配置
|
||
└── mkdocs.yml # 文档配置
|
||
```
|
||
|
||
## 核心模块说明
|
||
|
||
### MacrolactoneFragmenter (主入口)
|
||
```python
|
||
from src.macrolactone_fragmenter import MacrolactoneFragmenter
|
||
|
||
fragmenter = MacrolactoneFragmenter(ring_size=16)
|
||
result = fragmenter.process_molecule(smiles, parent_id="mol_001")
|
||
```
|
||
|
||
### MacroLactoneAnalyzer (环数分析)
|
||
```python
|
||
from src.macro_lactone_analyzer import MacroLactoneAnalyzer
|
||
|
||
analyzer = MacroLactoneAnalyzer()
|
||
info = analyzer.get_single_ring_info(smiles)
|
||
```
|
||
|
||
### Splicing 模块 (分子拼接)
|
||
```python
|
||
from src.splicing.scaffold_prep import prepare_tylosin_scaffold
|
||
from src.splicing.fragment_prep import activate_fragment
|
||
from src.splicing.engine import splice_molecule
|
||
|
||
# 准备骨架(移除侧链,标记dummy原子)
|
||
scaffold, dummy_map = prepare_tylosin_scaffold(smiles, positions=[3, 5, 9])
|
||
|
||
# 激活片段(添加连接点)
|
||
fragment = activate_fragment(fragment_smiles, strategy="smart")
|
||
|
||
# 拼接分子
|
||
new_mol = splice_molecule(scaffold, fragment, position=3)
|
||
```
|
||
|
||
### 数据类结构
|
||
```python
|
||
@dataclass
|
||
class Fragment:
|
||
fragment_smiles: str # 碎片 SMILES
|
||
parent_smiles: str # 母分子 SMILES
|
||
cleavage_position: int # 断裂位置 (1-N)
|
||
fragment_id: str # 碎片 ID
|
||
parent_id: str # 母分子 ID
|
||
atom_count: int # 原子数
|
||
molecular_weight: float # 分子量
|
||
```
|
||
|
||
## 开发命令
|
||
|
||
### 环境设置
|
||
```bash
|
||
# 安装依赖
|
||
pixi install
|
||
|
||
# 激活环境
|
||
pixi shell
|
||
```
|
||
|
||
### 代码质量
|
||
```bash
|
||
# 格式化代码
|
||
pixi run black src/
|
||
|
||
# 代码检查
|
||
pixi run flake8 src/
|
||
|
||
# 运行测试
|
||
pixi run pytest
|
||
|
||
# 测试覆盖率
|
||
pixi run pytest --cov=src
|
||
```
|
||
|
||
### 文档
|
||
```bash
|
||
# 本地预览文档
|
||
pixi run mkdocs serve
|
||
|
||
# 构建文档
|
||
pixi run mkdocs build
|
||
```
|
||
|
||
## 编码规范
|
||
|
||
### Python 风格
|
||
- 使用 Black 格式化,行宽 100 字符
|
||
- 使用 Google 风格的 docstring
|
||
- 类型注解:所有公共函数必须有类型提示
|
||
- 命名:类用 PascalCase,函数/变量用 snake_case
|
||
|
||
### Docstring 示例
|
||
```python
|
||
def process_molecule(self, smiles: str, parent_id: str = None) -> FragmentResult:
|
||
"""
|
||
处理单个分子,进行侧链断裂分析。
|
||
|
||
Args:
|
||
smiles: 分子的 SMILES 字符串
|
||
parent_id: 可选的分子标识符
|
||
|
||
Returns:
|
||
FragmentResult 对象,包含所有碎片信息
|
||
|
||
Raises:
|
||
ValueError: 如果 SMILES 无效或不是目标环大小
|
||
|
||
Example:
|
||
>>> fragmenter = MacrolactoneFragmenter(ring_size=16)
|
||
>>> result = fragmenter.process_molecule("C1CC...")
|
||
"""
|
||
```
|
||
|
||
### 导入顺序
|
||
```python
|
||
# 1. 标准库
|
||
import json
|
||
from pathlib import Path
|
||
from typing import List, Dict, Optional
|
||
|
||
# 2. 第三方库
|
||
import pandas as pd
|
||
import numpy as np
|
||
from rdkit import Chem
|
||
|
||
# 3. 本地模块
|
||
from src.fragment_dataclass import Fragment
|
||
from src.ring_numbering import RingNumbering
|
||
```
|
||
|
||
## 关键概念
|
||
|
||
### 环编号系统
|
||
- **位置 1**: 羰基碳(C=O 中的 C)
|
||
- **位置 2**: 酯键氧(环上的 O)
|
||
- **位置 3-N**: 按顺序编号环上剩余原子
|
||
|
||
### 支持的环大小
|
||
- 12元环 到 20元环
|
||
- 默认处理 16元环
|
||
|
||
### SMARTS 模式
|
||
```python
|
||
# 内酯键 SMARTS(16元环示例)
|
||
LACTONE_SMARTS_16 = "[C;R16](=O)[O;R16]"
|
||
```
|
||
|
||
## 测试指南
|
||
|
||
### 运行测试
|
||
```bash
|
||
# 全部测试
|
||
pixi run pytest
|
||
|
||
# 特定模块
|
||
pixi run pytest tests/test_fragmenter.py
|
||
|
||
# 详细输出
|
||
pixi run pytest -v
|
||
|
||
# 单个测试
|
||
pixi run pytest tests/test_fragmenter.py::test_process_molecule
|
||
```
|
||
|
||
### 测试数据
|
||
测试用的 SMILES 示例(16元环大环内酯):
|
||
```python
|
||
TEST_SMILES = [
|
||
"O=C1CCCCCCCC(=O)OCC/C=C/C=C/1", # 简单 16 元环
|
||
"CCC1OC(=O)C[C@H](O)C(C)[C@@H](O)...", # 复杂结构
|
||
]
|
||
```
|
||
|
||
## 常见任务
|
||
|
||
### 添加新功能
|
||
1. 在 `src/` 目录创建或修改模块
|
||
2. 更新 `src/__init__.py` 导出新类/函数
|
||
3. 编写单元测试
|
||
4. 更新文档
|
||
|
||
### 处理新的环大小
|
||
```python
|
||
# 在 MacrolactoneFragmenter 中指定环大小
|
||
fragmenter = MacrolactoneFragmenter(ring_size=14) # 14元环
|
||
```
|
||
|
||
### 批量处理
|
||
```python
|
||
results = fragmenter.process_csv(
|
||
"data/molecules.csv",
|
||
smiles_column="smiles",
|
||
id_column="unique_id",
|
||
max_rows=1000
|
||
)
|
||
df = fragmenter.batch_to_dataframe(results)
|
||
```
|
||
|
||
## 注意事项
|
||
|
||
### RDKit 依赖
|
||
- RDKit 必须通过 conda/pixi 安装,不支持 pip
|
||
- 确保环境中有 RDKit:`python -c "from rdkit import Chem; print('OK')"`
|
||
|
||
### 性能考虑
|
||
- 批量处理大数据集时,使用 `process_csv` 方法
|
||
- 处理速度约 ~100 分子/分钟
|
||
- 大规模处理考虑使用 `scripts/batch_process_*.py`
|
||
|
||
### 错误处理
|
||
- 无效 SMILES 会抛出 `ValueError`
|
||
- 非目标环大小会被跳过
|
||
- 批量处理会记录失败的分子到日志
|
||
|
||
## 相关资源
|
||
|
||
- **文档**: `docs/` 目录或运行 `pixi run mkdocs serve`
|
||
- **示例**: `notebooks/filter_molecules.ipynb`
|
||
- **脚本**: `scripts/README.md`
|
||
|
||
---
|
||
|
||
*最后更新: 2025-01-23*
|