feat(numbering): publish canonical numbering API
Add a public numbering module and route fragmenting, validation, and scaffold preparation through the canonical numbering entry. Rewrite the repository entry docs around the fixed numbering contract, add MkDocs landing pages, and document the mirror mapping used for medicinal-chemistry comparisons. Also refresh the validation analysis reports to explain the canonical-versus-mirrored numbering relationship.
This commit is contained in:
@@ -1,275 +1,23 @@
|
||||
# AGENTS.md
|
||||
# Project Docs AGENTS
|
||||
|
||||
本文件为 AI 编程助手(如 Claude、Copilot、Cursor 等)提供项目上下文和开发指南。
|
||||
This page is a project-docs landing note only.
|
||||
The authoritative agent entry is the repository root `AGENTS.md`.
|
||||
|
||||
## 项目概述
|
||||
## What belongs here
|
||||
|
||||
**Macrolactone Fragmenter** 是一个专业的大环内酯(12-20元环)侧链断裂和分析工具,用于化学信息学研究。
|
||||
- Docs-system notes
|
||||
- Project documentation summaries
|
||||
- Short commands and maintenance reminders
|
||||
|
||||
### 核心功能
|
||||
- 智能环原子编号(基于内酯结构)
|
||||
- 自动侧链断裂分析
|
||||
- 分子可视化(SVG/PNG)
|
||||
- 批量处理和数据导出
|
||||
## What does not belong here
|
||||
|
||||
## 技术栈
|
||||
- Canonical policy overrides
|
||||
- Alternate numbering rules
|
||||
- `clockwise` / `anticlockwise` controls
|
||||
|
||||
| 组件 | 技术 |
|
||||
|------|------|
|
||||
| 语言 | Python 3.8+ |
|
||||
| 化学库 | RDKit |
|
||||
| 数据处理 | Pandas, NumPy |
|
||||
| 可视化 | Matplotlib, Seaborn |
|
||||
| 环境管理 | Pixi (推荐) / Conda |
|
||||
| 文档 | MkDocs + Material |
|
||||
| 测试 | Pytest |
|
||||
| 代码格式 | Black, Flake8 |
|
||||
## Stable rule reminder
|
||||
|
||||
## 项目结构
|
||||
|
||||
```
|
||||
macro_split/
|
||||
├── src/ # 核心源代码
|
||||
│ ├── __init__.py # 包初始化
|
||||
│ ├── macrolactone_fragmenter.py # ⭐ 主入口类
|
||||
│ ├── macro_lactone_analyzer.py # 环数分析器
|
||||
│ ├── ring_numbering.py # 环编号系统
|
||||
│ ├── ring_visualization.py # 可视化工具
|
||||
│ ├── fragment_cleaver.py # 侧链断裂逻辑
|
||||
│ ├── fragment_dataclass.py # 碎片数据类
|
||||
│ ├── visualizer.py # 统计可视化
|
||||
│ └── splicing/ # 分子拼接模块
|
||||
│ ├── engine.py # 拼接引擎
|
||||
│ ├── scaffold_prep.py # 骨架准备
|
||||
│ └── fragment_prep.py # 片段激活
|
||||
├── notebooks/ # Jupyter Notebook 示例
|
||||
├── scripts/ # 批量处理脚本
|
||||
├── tests/ # 单元测试
|
||||
├── docs/ # 文档目录
|
||||
├── pyproject.toml # 项目配置
|
||||
├── pixi.toml # Pixi 环境配置
|
||||
└── mkdocs.yml # 文档配置
|
||||
```
|
||||
|
||||
## 核心模块说明
|
||||
|
||||
### MacrolactoneFragmenter (主入口)
|
||||
```python
|
||||
from src.macrolactone_fragmenter import MacrolactoneFragmenter
|
||||
|
||||
fragmenter = MacrolactoneFragmenter(ring_size=16)
|
||||
result = fragmenter.process_molecule(smiles, parent_id="mol_001")
|
||||
```
|
||||
|
||||
### MacroLactoneAnalyzer (环数分析)
|
||||
```python
|
||||
from src.macro_lactone_analyzer import MacroLactoneAnalyzer
|
||||
|
||||
analyzer = MacroLactoneAnalyzer()
|
||||
info = analyzer.get_single_ring_info(smiles)
|
||||
```
|
||||
|
||||
### Splicing 模块 (分子拼接)
|
||||
```python
|
||||
from src.splicing.scaffold_prep import prepare_tylosin_scaffold
|
||||
from src.splicing.fragment_prep import activate_fragment
|
||||
from src.splicing.engine import splice_molecule
|
||||
|
||||
# 准备骨架(移除侧链,标记dummy原子)
|
||||
scaffold, dummy_map = prepare_tylosin_scaffold(smiles, positions=[3, 5, 9])
|
||||
|
||||
# 激活片段(添加连接点)
|
||||
fragment = activate_fragment(fragment_smiles, strategy="smart")
|
||||
|
||||
# 拼接分子
|
||||
new_mol = splice_molecule(scaffold, fragment, position=3)
|
||||
```
|
||||
|
||||
### 数据类结构
|
||||
```python
|
||||
@dataclass
|
||||
class Fragment:
|
||||
fragment_smiles: str # 碎片 SMILES
|
||||
parent_smiles: str # 母分子 SMILES
|
||||
cleavage_position: int # 断裂位置 (1-N)
|
||||
fragment_id: str # 碎片 ID
|
||||
parent_id: str # 母分子 ID
|
||||
atom_count: int # 原子数
|
||||
molecular_weight: float # 分子量
|
||||
```
|
||||
|
||||
## 开发命令
|
||||
|
||||
### 环境设置
|
||||
```bash
|
||||
# 安装依赖
|
||||
pixi install
|
||||
|
||||
# 激活环境
|
||||
pixi shell
|
||||
```
|
||||
|
||||
### 代码质量
|
||||
```bash
|
||||
# 格式化代码
|
||||
pixi run black src/
|
||||
|
||||
# 代码检查
|
||||
pixi run flake8 src/
|
||||
|
||||
# 运行测试
|
||||
pixi run pytest
|
||||
|
||||
# 测试覆盖率
|
||||
pixi run pytest --cov=src
|
||||
```
|
||||
|
||||
### 文档
|
||||
```bash
|
||||
# 本地预览文档
|
||||
pixi run mkdocs serve
|
||||
|
||||
# 构建文档
|
||||
pixi run mkdocs build
|
||||
```
|
||||
|
||||
## 编码规范
|
||||
|
||||
### Python 风格
|
||||
- 使用 Black 格式化,行宽 100 字符
|
||||
- 使用 Google 风格的 docstring
|
||||
- 类型注解:所有公共函数必须有类型提示
|
||||
- 命名:类用 PascalCase,函数/变量用 snake_case
|
||||
|
||||
### Docstring 示例
|
||||
```python
|
||||
def process_molecule(self, smiles: str, parent_id: str = None) -> FragmentResult:
|
||||
"""
|
||||
处理单个分子,进行侧链断裂分析。
|
||||
|
||||
Args:
|
||||
smiles: 分子的 SMILES 字符串
|
||||
parent_id: 可选的分子标识符
|
||||
|
||||
Returns:
|
||||
FragmentResult 对象,包含所有碎片信息
|
||||
|
||||
Raises:
|
||||
ValueError: 如果 SMILES 无效或不是目标环大小
|
||||
|
||||
Example:
|
||||
>>> fragmenter = MacrolactoneFragmenter(ring_size=16)
|
||||
>>> result = fragmenter.process_molecule("C1CC...")
|
||||
"""
|
||||
```
|
||||
|
||||
### 导入顺序
|
||||
```python
|
||||
# 1. 标准库
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import List, Dict, Optional
|
||||
|
||||
# 2. 第三方库
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
from rdkit import Chem
|
||||
|
||||
# 3. 本地模块
|
||||
from src.fragment_dataclass import Fragment
|
||||
from src.ring_numbering import RingNumbering
|
||||
```
|
||||
|
||||
## 关键概念
|
||||
|
||||
### 环编号系统
|
||||
- **位置 1**: 羰基碳(C=O 中的 C)
|
||||
- **位置 2**: 酯键氧(环上的 O)
|
||||
- **位置 3-N**: 按顺序编号环上剩余原子
|
||||
|
||||
### 支持的环大小
|
||||
- 12元环 到 20元环
|
||||
- 默认处理 16元环
|
||||
|
||||
### SMARTS 模式
|
||||
```python
|
||||
# 内酯键 SMARTS(16元环示例)
|
||||
LACTONE_SMARTS_16 = "[C;R16](=O)[O;R16]"
|
||||
```
|
||||
|
||||
## 测试指南
|
||||
|
||||
### 运行测试
|
||||
```bash
|
||||
# 全部测试
|
||||
pixi run pytest
|
||||
|
||||
# 特定模块
|
||||
pixi run pytest tests/test_fragmenter.py
|
||||
|
||||
# 详细输出
|
||||
pixi run pytest -v
|
||||
|
||||
# 单个测试
|
||||
pixi run pytest tests/test_fragmenter.py::test_process_molecule
|
||||
```
|
||||
|
||||
### 测试数据
|
||||
测试用的 SMILES 示例(16元环大环内酯):
|
||||
```python
|
||||
TEST_SMILES = [
|
||||
"O=C1CCCCCCCC(=O)OCC/C=C/C=C/1", # 简单 16 元环
|
||||
"CCC1OC(=O)C[C@H](O)C(C)[C@@H](O)...", # 复杂结构
|
||||
]
|
||||
```
|
||||
|
||||
## 常见任务
|
||||
|
||||
### 添加新功能
|
||||
1. 在 `src/` 目录创建或修改模块
|
||||
2. 更新 `src/__init__.py` 导出新类/函数
|
||||
3. 编写单元测试
|
||||
4. 更新文档
|
||||
|
||||
### 处理新的环大小
|
||||
```python
|
||||
# 在 MacrolactoneFragmenter 中指定环大小
|
||||
fragmenter = MacrolactoneFragmenter(ring_size=14) # 14元环
|
||||
```
|
||||
|
||||
### 批量处理
|
||||
```python
|
||||
results = fragmenter.process_csv(
|
||||
"data/molecules.csv",
|
||||
smiles_column="smiles",
|
||||
id_column="unique_id",
|
||||
max_rows=1000
|
||||
)
|
||||
df = fragmenter.batch_to_dataframe(results)
|
||||
```
|
||||
|
||||
## 注意事项
|
||||
|
||||
### RDKit 依赖
|
||||
- RDKit 必须通过 conda/pixi 安装,不支持 pip
|
||||
- 确保环境中有 RDKit:`python -c "from rdkit import Chem; print('OK')"`
|
||||
|
||||
### 性能考虑
|
||||
- 批量处理大数据集时,使用 `process_csv` 方法
|
||||
- 处理速度约 ~100 分子/分钟
|
||||
- 大规模处理考虑使用 `scripts/batch_process_*.py`
|
||||
|
||||
### 错误处理
|
||||
- 无效 SMILES 会抛出 `ValueError`
|
||||
- 非目标环大小会被跳过
|
||||
- 批量处理会记录失败的分子到日志
|
||||
|
||||
## 相关资源
|
||||
|
||||
- **文档**: `docs/` 目录或运行 `pixi run mkdocs serve`
|
||||
- **示例**: `notebooks/filter_molecules.ipynb`
|
||||
- **脚本**: `scripts/README.md`
|
||||
|
||||
---
|
||||
|
||||
*最后更新: 2025-01-23*
|
||||
- `1 = 内酯羰基碳`
|
||||
- `2 = 相邻酯氧`
|
||||
- `3..N = 从 2 位出发沿环唯一图遍历顺序继续编号`
|
||||
- 16 元环镜像映射固定为 `p_mirror = ring_size - p + 3`
|
||||
|
||||
Reference in New Issue
Block a user