diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..4d330a7 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,38 @@ +# AGENTS.md + +This is the only authoritative agent entry for this repository. +If another `AGENTS.md` file says something different, follow this file. + +## Canonical numbering + +- `1 = 内酯羰基碳` +- `2 = 相邻酯氧` +- `3..N = 从 2 位出发沿环唯一图遍历顺序继续编号` + +For 16-membered rings, the mirror mapping is fixed: + +- `3 → 16` +- `4 → 15` +- `5 → 14` +- `6 → 13` +- `7 → 12` +- `8 → 11` +- `9 → 10` + +This numbering is deterministic and is not a visual clockwise / anticlockwise toggle. +它不是视觉顺时针,也不是视觉逆时针切换。 +The public API does not expose `clockwise` or `anticlockwise` parameters. + +## Practical rule + +- Use canonical numbering in code, reports, and validation outputs. +- Convert to literature-style mirrored labels only when you are comparing against a source that numbers the ring from the opposite direction. +- Keep bridge / fused multi-anchor cases explicit; do not silently reinterpret them as a direction choice. + +## Entry points + +- `README.md` is the progressive disclosure landing page. +- `docs/index.md` is the documentation landing page. +- `docs/user-guide/ring-numbering.md` is the canonical numbering reference. +- `docs/development/project-structure.md` is the repository layout reference. +- `docs/project-docs/AGENTS.md` points back here and should never override this file. diff --git a/README.md b/README.md index 64ca723..01b5d62 100644 --- a/README.md +++ b/README.md @@ -1,157 +1,71 @@ # macro_lactone_toolkit -`macro_lactone_toolkit` 是一个正式可安装的 Python 包,用于 12-20 元有效大环内酯的识别、环编号、侧链裂解和简单拼接回组装。 +`macro_lactone_toolkit` 是一个用于 12-20 元大环内酯识别、canonical numbering、侧链裂解和拼接准备的 Python 工具包。 -## 核心能力 +## 先看哪里 -- 默认自动识别 12-20 元有效大环内酯,也允许显式指定 `ring_size` -- 环编号规则固定为: - - 位置 1 = 内酯羰基碳 - - 位置 2 = 环上的酯键氧 - - 位置 3-N = 沿统一方向连续编号 -- 侧链裂解同时输出两套 SMILES: - - `fragment_smiles_labeled`,例如 `[5*]` - - `fragment_smiles_plain`,例如 `*` -- dummy 原子与连接原子的原始键型保持一致 -- 提供正式 CLI: - - `macro-lactone-toolkit analyze` - - `macro-lactone-toolkit number` - - `macro-lactone-toolkit fragment` +- 只想快速上手: [docs/index.md](docs/index.md) +- 只想看编号规则: [docs/user-guide/ring-numbering.md](docs/user-guide/ring-numbering.md) +- 只想看项目结构: [docs/development/project-structure.md](docs/development/project-structure.md) +- 只想看 agent 入口: [AGENTS.md](AGENTS.md) -## 环境 +## 渐进式入口 -推荐使用 `pixi`,项目已固定到 Python 3.12,并支持 `osx-arm64` 与 `linux-64`。 +### 1. 先确认编号契约 -```bash -pixi install -pixi run pytest -pixi run python -c "import macro_lactone_toolkit" -``` +这个仓库只有一套 canonical numbering: -## Python API +- `1 = 内酯羰基碳` +- `2 = 相邻酯氧` +- `3..N = 从 2 位出发沿环唯一图遍历顺序继续编号` + +对 16 元环,镜像映射是固定的: + +- `3 → 16` +- `4 → 15` +- `5 → 14` +- `6 → 13` +- `7 → 12` +- `8 → 11` +- `9 → 10` +- `10 → 9` +- `11 → 8` +- `12 → 7` +- `13 → 6` +- `14 → 5` +- `15 → 4` +- `16 → 3` + +这不是视觉顺时针/逆时针切换,公开 API 也不提供 `clockwise` / `anticlockwise` 参数。 + +### 2. 再看最小用法 ```python from macro_lactone_toolkit import MacroLactoneAnalyzer, MacrolactoneFragmenter analyzer = MacroLactoneAnalyzer() -valid_ring_sizes = analyzer.get_valid_ring_sizes("O=C1CCCCCCCCCCCCCCO1") - fragmenter = MacrolactoneFragmenter() -numbering = fragmenter.number_molecule("O=C1CCCCCCCCCCCCCCO1") -result = fragmenter.fragment_molecule("O=C1CCCC(C)CCCCCCCCCCO1", parent_id="mol_001") + +print(analyzer.get_valid_ring_sizes("O=C1CCCCCCCCCCCCCCO1")) +print(fragmenter.number_molecule("O=C1CCCCCCCCCCCCCCO1").position_to_atom) ``` -## CLI - -单分子分析: - ```bash +pixi install +pixi run pytest pixi run macro-lactone-toolkit analyze --smiles "O=C1CCCCCCCCCCCCCCO1" pixi run macro-lactone-toolkit number --smiles "O=C1CCCCCCCCCCCCCCO1" pixi run macro-lactone-toolkit fragment --smiles "O=C1CCCC(C)CCCCCCCCCCO1" --parent-id mol_001 ``` -CSV 批处理: +### 3. 再深入到页面 -```bash -pixi run macro-lactone-toolkit fragment \ - --input molecules.csv \ - --output fragments.csv \ - --errors-output fragment_errors.csv -``` +- [docs/getting-started.md](docs/getting-started.md) +- [docs/user-guide/index.md](docs/user-guide/index.md) +- [docs/development/index.md](docs/development/index.md) -默认读取 `smiles` 列;若存在 `id` 列则将其作为 `parent_id`,否则自动生成 `row_`。 +## 维护约束 -## MacrolactoneDB 验证模块 - -用于对 MacrolactoneDB 数据库进行抽样验证、分类、侧链断裂和数据库存储。 - -### 验证脚本使用 - -```bash -# 基本使用(10% 分层抽样) -pixi run python scripts/validate_macrolactone_db.py \ - --input data/MacrolactoneDB/ring12_20/temp.csv \ - --output validation_output \ - --sample-ratio 0.1 - -# 处理全量数据 -pixi run python scripts/validate_macrolactone_db.py \ - --input data/MacrolactoneDB/ring12_20/temp.csv \ - --output validation_output \ - --sample-ratio 1.0 - -# 指定列名(如果 CSV 列名不同) -pixi run python scripts/validate_macrolactone_db.py \ - --input data.csv \ - --output validation_output \ - --id-col ml_id \ - --chembl-id-col IDs \ - --smiles-col smiles -``` - -### 输出结构 - -``` -validation_output/ -├── README.md # 目录说明 -├── fragments.db # SQLite 数据库 -├── fragment_library.csv # 最终片段库导出(含 has_dummy_atom / splice_ready) -├── summary.csv # 汇总表(含 ml_id, chembl_id) -├── summary_statistics.json # 统计信息 -├── ring_size_12/ # 按环大小组织 -├── ring_size_13/ -... -└── ring_size_20/ - ├── standard/ - │ ├── numbered/ # 带编号的环图(文件名使用 ml_id) - │ │ └── {ml_id}_numbered.png - │ └── sidechains/ # 片段图 - │ └── {ml_id}/ - │ └── {ml_id}_frag_{n}_pos{pos}.png - ├── non_standard/original/ - └── rejected/original/ -``` - -### 数据库查询示例 - -```bash -# 查看表结构 -sqlite3 validation_output/fragments.db ".tables" - -# 查询标准大环内酯 -sqlite3 validation_output/fragments.db \ - "SELECT ml_id, chembl_id, ring_size, num_sidechains \ - FROM parent_molecules \ - WHERE classification='standard_macrolactone' LIMIT 5;" - -# 查询最终片段库 -sqlite3 validation_output/fragments.db \ - "SELECT source_type, source_parent_ml_id, cleavage_position, has_dummy_atom, splice_ready \ - FROM fragment_library_entries LIMIT 10;" - -# 查询片段 -sqlite3 validation_output/fragments.db \ - "SELECT fragment_id, cleavage_position, dummy_isotope, has_dummy_atom, dummy_atom_count \ - FROM side_chain_fragments LIMIT 10;" - -# 按环大小统计 -sqlite3 validation_output/fragments.db \ - "SELECT ring_size, COUNT(*) FROM parent_molecules GROUP BY ring_size;" -``` - -### 关键字段说明 - -| 字段 | 说明 | -|------|------| -| `ml_id` | MacrolactoneDB 唯一 ID(如 ML00000001),用于文件命名 | -| `chembl_id` | 原始 CHEMBL ID(如 CHEMBL94657),可能为空 | -| `classification` | standard_macrolactone / non_standard_macrocycle / not_macrolactone | -| `dummy_isotope` | 裂解位置编号,用于片段重建 | -| `cleavage_position` | 环上的断裂位置 | -| `has_dummy_atom` | 该片段是否带 dummy 原子,可用于区分可直接拼接片段 | -| `splice_ready` | 是否与当前单锚点拼接流程直接兼容 | - -## Legacy Scripts - -`scripts/` 目录保留为薄封装或迁移提示,不再承载核心实现。正式接口以 `macro_lactone_toolkit.*` 与 `macro-lactone-toolkit` CLI 为准。 +- 根目录 `AGENTS.md` 是唯一权威 agent 入口。 +- 入口文档只保留当前真实存在、持续维护的页面。 +- 如果你要把药化文献里的镜像位置拿来对照,先按 canonical numbering 记账,再做镜像转换。 diff --git a/docs/development/index.md b/docs/development/index.md new file mode 100644 index 0000000..031059f --- /dev/null +++ b/docs/development/index.md @@ -0,0 +1,19 @@ +# 开发者指南 + +这里给维护者看:项目结构、入口和约束。 + +## 先看什么 + +- [项目结构](project-structure.md) +- 仓库根目录的 `AGENTS.md` 是唯一权威 agent 入口 + +## 维护原则 + +- 入口文档只保留真实存在、持续维护的页面。 +- 编号规则只使用 canonical numbering。 +- 不引入 `clockwise` / `anticlockwise` 参数。 + +## 适合继续往下看的内容 + +- 如果你在找包和脚本分别负责什么,去 [project-structure.md](project-structure.md) +- 如果你在找 agent 约束,直接查看仓库根目录 `AGENTS.md` diff --git a/docs/development/project-structure.md b/docs/development/project-structure.md new file mode 100644 index 0000000..bb33945 --- /dev/null +++ b/docs/development/project-structure.md @@ -0,0 +1,29 @@ +# 项目结构 + +这是当前仓库里真正承担职责的目录划分。 + +## 顶层目录 + +- `src/macro_lactone_toolkit/`: 正式 Python 包,包含分析、编号、裂解、可视化、工作流和验证模块。 +- `scripts/`: 薄封装和批处理脚本,基于正式包接口运行。 +- `tests/`: pytest 测试,覆盖入口、脚本和核心行为。 +- `docs/`: 面向使用者和维护者的入口文档。 +- `notebooks/`: 探索性或归档性的 notebook,不作为权威接口说明。 +- `validation_output/`: 生成的验证产物和报告,属于输出,不是核心源码。 + +## 关键入口 + +- `macro_lactone_toolkit.analyzer.MacroLactoneAnalyzer` +- `macro_lactone_toolkit.fragmenter.MacrolactoneFragmenter` +- `macro-lactone-toolkit` CLI + +## 结构约束 + +- 代码和文档都只认 canonical numbering。 +- 16 元环镜像映射按 `p_mirror = ring_size - p + 3` 处理。 +- 不用 `clockwise` / `anticlockwise` 参数来表达编号方向。 + +## 维护提示 + +- `scripts/README.md` 解释脚本层的现状。 +- `docs/project-docs/AGENTS.md` 只是项目文档入口,不是权威 agent 入口。 diff --git a/docs/getting-started.md b/docs/getting-started.md new file mode 100644 index 0000000..979a02f --- /dev/null +++ b/docs/getting-started.md @@ -0,0 +1,39 @@ +# 快速开始 + +这页只放最短路径:安装、验证、第一次调用。 + +## 安装与验证 + +```bash +pixi install +pixi run pytest +pixi run python -c "import macro_lactone_toolkit" +``` + +## 第一次分析 + +```python +from macro_lactone_toolkit import MacroLactoneAnalyzer, MacrolactoneFragmenter + +analyzer = MacroLactoneAnalyzer() +fragmenter = MacrolactoneFragmenter() + +print(analyzer.get_valid_ring_sizes("O=C1CCCCCCCCCCCCCCO1")) +print(fragmenter.number_molecule("O=C1CCCCCCCCCCCCCCO1").position_to_atom) +``` + +```bash +pixi run macro-lactone-toolkit analyze --smiles "O=C1CCCCCCCCCCCCCCO1" +pixi run macro-lactone-toolkit number --smiles "O=C1CCCCCCCCCCCCCCO1" +pixi run macro-lactone-toolkit fragment --smiles "O=C1CCCC(C)CCCCCCCCCCO1" --parent-id mol_001 +``` + +## 你需要记住的规则 + +- `1 = 内酯羰基碳` +- `2 = 相邻酯氧` +- `3..N = 从 2 位出发沿环唯一图遍历顺序继续编号` +- 16 元环镜像映射固定为 `p_mirror = ring_size - p + 3` +- 不支持 `clockwise` / `anticlockwise` 参数 + +如果你要继续往下看,去 [user-guide/index.md](user-guide/index.md)。 diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000..697dac3 --- /dev/null +++ b/docs/index.md @@ -0,0 +1,29 @@ +# 文档首页 + +这里是文档入口。先按你要解决的问题选路,不要一开始就翻完整套材料。 + +## 快速路径 + +- 想先跑起来: [getting-started.md](getting-started.md) +- 想先确认编号规则: [user-guide/ring-numbering.md](user-guide/ring-numbering.md) +- 想先看仓库结构: [development/project-structure.md](development/project-structure.md) + +## 入口约定 + +这个项目只使用一套 canonical numbering: + +- `1 = 内酯羰基碳` +- `2 = 相邻酯氧` +- `3..N = 从 2 位出发沿环唯一图遍历顺序继续编号` + +对 16 元环,镜像映射是固定的 `p_mirror = ring_size - p + 3`,因此 `6 → 13`、`7 → 12`、`15 → 4`、`16 → 3`。 +公开 API 不支持 `clockwise` / `anticlockwise` 参数。 + +## 这套文档保留什么 + +- `README.md`: 渐进式入口 +- `AGENTS.md`: 唯一权威 agent 入口 +- `user-guide/`: 面向使用者的稳定规则 +- `development/`: 面向维护者的结构说明 + +如果你只想开始干活,先看 [getting-started.md](getting-started.md)。 diff --git a/docs/project-docs/AGENTS.md b/docs/project-docs/AGENTS.md index 2432209..336eba8 100644 --- a/docs/project-docs/AGENTS.md +++ b/docs/project-docs/AGENTS.md @@ -1,275 +1,23 @@ -# AGENTS.md +# Project Docs AGENTS -本文件为 AI 编程助手(如 Claude、Copilot、Cursor 等)提供项目上下文和开发指南。 +This page is a project-docs landing note only. +The authoritative agent entry is the repository root `AGENTS.md`. -## 项目概述 +## What belongs here -**Macrolactone Fragmenter** 是一个专业的大环内酯(12-20元环)侧链断裂和分析工具,用于化学信息学研究。 +- Docs-system notes +- Project documentation summaries +- Short commands and maintenance reminders -### 核心功能 -- 智能环原子编号(基于内酯结构) -- 自动侧链断裂分析 -- 分子可视化(SVG/PNG) -- 批量处理和数据导出 +## What does not belong here -## 技术栈 +- Canonical policy overrides +- Alternate numbering rules +- `clockwise` / `anticlockwise` controls -| 组件 | 技术 | -|------|------| -| 语言 | Python 3.8+ | -| 化学库 | RDKit | -| 数据处理 | Pandas, NumPy | -| 可视化 | Matplotlib, Seaborn | -| 环境管理 | Pixi (推荐) / Conda | -| 文档 | MkDocs + Material | -| 测试 | Pytest | -| 代码格式 | Black, Flake8 | +## Stable rule reminder -## 项目结构 - -``` -macro_split/ -├── src/ # 核心源代码 -│ ├── __init__.py # 包初始化 -│ ├── macrolactone_fragmenter.py # ⭐ 主入口类 -│ ├── macro_lactone_analyzer.py # 环数分析器 -│ ├── ring_numbering.py # 环编号系统 -│ ├── ring_visualization.py # 可视化工具 -│ ├── fragment_cleaver.py # 侧链断裂逻辑 -│ ├── fragment_dataclass.py # 碎片数据类 -│ ├── visualizer.py # 统计可视化 -│ └── splicing/ # 分子拼接模块 -│ ├── engine.py # 拼接引擎 -│ ├── scaffold_prep.py # 骨架准备 -│ └── fragment_prep.py # 片段激活 -├── notebooks/ # Jupyter Notebook 示例 -├── scripts/ # 批量处理脚本 -├── tests/ # 单元测试 -├── docs/ # 文档目录 -├── pyproject.toml # 项目配置 -├── pixi.toml # Pixi 环境配置 -└── mkdocs.yml # 文档配置 -``` - -## 核心模块说明 - -### MacrolactoneFragmenter (主入口) -```python -from src.macrolactone_fragmenter import MacrolactoneFragmenter - -fragmenter = MacrolactoneFragmenter(ring_size=16) -result = fragmenter.process_molecule(smiles, parent_id="mol_001") -``` - -### MacroLactoneAnalyzer (环数分析) -```python -from src.macro_lactone_analyzer import MacroLactoneAnalyzer - -analyzer = MacroLactoneAnalyzer() -info = analyzer.get_single_ring_info(smiles) -``` - -### Splicing 模块 (分子拼接) -```python -from src.splicing.scaffold_prep import prepare_tylosin_scaffold -from src.splicing.fragment_prep import activate_fragment -from src.splicing.engine import splice_molecule - -# 准备骨架(移除侧链,标记dummy原子) -scaffold, dummy_map = prepare_tylosin_scaffold(smiles, positions=[3, 5, 9]) - -# 激活片段(添加连接点) -fragment = activate_fragment(fragment_smiles, strategy="smart") - -# 拼接分子 -new_mol = splice_molecule(scaffold, fragment, position=3) -``` - -### 数据类结构 -```python -@dataclass -class Fragment: - fragment_smiles: str # 碎片 SMILES - parent_smiles: str # 母分子 SMILES - cleavage_position: int # 断裂位置 (1-N) - fragment_id: str # 碎片 ID - parent_id: str # 母分子 ID - atom_count: int # 原子数 - molecular_weight: float # 分子量 -``` - -## 开发命令 - -### 环境设置 -```bash -# 安装依赖 -pixi install - -# 激活环境 -pixi shell -``` - -### 代码质量 -```bash -# 格式化代码 -pixi run black src/ - -# 代码检查 -pixi run flake8 src/ - -# 运行测试 -pixi run pytest - -# 测试覆盖率 -pixi run pytest --cov=src -``` - -### 文档 -```bash -# 本地预览文档 -pixi run mkdocs serve - -# 构建文档 -pixi run mkdocs build -``` - -## 编码规范 - -### Python 风格 -- 使用 Black 格式化,行宽 100 字符 -- 使用 Google 风格的 docstring -- 类型注解:所有公共函数必须有类型提示 -- 命名:类用 PascalCase,函数/变量用 snake_case - -### Docstring 示例 -```python -def process_molecule(self, smiles: str, parent_id: str = None) -> FragmentResult: - """ - 处理单个分子,进行侧链断裂分析。 - - Args: - smiles: 分子的 SMILES 字符串 - parent_id: 可选的分子标识符 - - Returns: - FragmentResult 对象,包含所有碎片信息 - - Raises: - ValueError: 如果 SMILES 无效或不是目标环大小 - - Example: - >>> fragmenter = MacrolactoneFragmenter(ring_size=16) - >>> result = fragmenter.process_molecule("C1CC...") - """ -``` - -### 导入顺序 -```python -# 1. 标准库 -import json -from pathlib import Path -from typing import List, Dict, Optional - -# 2. 第三方库 -import pandas as pd -import numpy as np -from rdkit import Chem - -# 3. 本地模块 -from src.fragment_dataclass import Fragment -from src.ring_numbering import RingNumbering -``` - -## 关键概念 - -### 环编号系统 -- **位置 1**: 羰基碳(C=O 中的 C) -- **位置 2**: 酯键氧(环上的 O) -- **位置 3-N**: 按顺序编号环上剩余原子 - -### 支持的环大小 -- 12元环 到 20元环 -- 默认处理 16元环 - -### SMARTS 模式 -```python -# 内酯键 SMARTS(16元环示例) -LACTONE_SMARTS_16 = "[C;R16](=O)[O;R16]" -``` - -## 测试指南 - -### 运行测试 -```bash -# 全部测试 -pixi run pytest - -# 特定模块 -pixi run pytest tests/test_fragmenter.py - -# 详细输出 -pixi run pytest -v - -# 单个测试 -pixi run pytest tests/test_fragmenter.py::test_process_molecule -``` - -### 测试数据 -测试用的 SMILES 示例(16元环大环内酯): -```python -TEST_SMILES = [ - "O=C1CCCCCCCC(=O)OCC/C=C/C=C/1", # 简单 16 元环 - "CCC1OC(=O)C[C@H](O)C(C)[C@@H](O)...", # 复杂结构 -] -``` - -## 常见任务 - -### 添加新功能 -1. 在 `src/` 目录创建或修改模块 -2. 更新 `src/__init__.py` 导出新类/函数 -3. 编写单元测试 -4. 更新文档 - -### 处理新的环大小 -```python -# 在 MacrolactoneFragmenter 中指定环大小 -fragmenter = MacrolactoneFragmenter(ring_size=14) # 14元环 -``` - -### 批量处理 -```python -results = fragmenter.process_csv( - "data/molecules.csv", - smiles_column="smiles", - id_column="unique_id", - max_rows=1000 -) -df = fragmenter.batch_to_dataframe(results) -``` - -## 注意事项 - -### RDKit 依赖 -- RDKit 必须通过 conda/pixi 安装,不支持 pip -- 确保环境中有 RDKit:`python -c "from rdkit import Chem; print('OK')"` - -### 性能考虑 -- 批量处理大数据集时,使用 `process_csv` 方法 -- 处理速度约 ~100 分子/分钟 -- 大规模处理考虑使用 `scripts/batch_process_*.py` - -### 错误处理 -- 无效 SMILES 会抛出 `ValueError` -- 非目标环大小会被跳过 -- 批量处理会记录失败的分子到日志 - -## 相关资源 - -- **文档**: `docs/` 目录或运行 `pixi run mkdocs serve` -- **示例**: `notebooks/filter_molecules.ipynb` -- **脚本**: `scripts/README.md` - ---- - -*最后更新: 2025-01-23* +- `1 = 内酯羰基碳` +- `2 = 相邻酯氧` +- `3..N = 从 2 位出发沿环唯一图遍历顺序继续编号` +- 16 元环镜像映射固定为 `p_mirror = ring_size - p + 3` diff --git a/docs/user-guide/index.md b/docs/user-guide/index.md new file mode 100644 index 0000000..c2900d8 --- /dev/null +++ b/docs/user-guide/index.md @@ -0,0 +1,17 @@ +# 用户指南 + +这里收的是稳定规则,不放临时笔记。 + +## 你最可能要看的内容 + +- [环编号系统](ring-numbering.md) + +## 这个目录的边界 + +- 这里只描述公开、可复用、会长期维持的行为。 +- 所有编号说明都以 canonical numbering 为准。 +- 如果文献图里用的是反向标注,先把它转换成镜像映射,再和代码结果对照。 + +## 一句话约定 + +这套工具没有 `clockwise` / `anticlockwise` 开关,编号方向不靠参数切换。 diff --git a/docs/user-guide/ring-numbering.md b/docs/user-guide/ring-numbering.md new file mode 100644 index 0000000..ffed9e2 --- /dev/null +++ b/docs/user-guide/ring-numbering.md @@ -0,0 +1,55 @@ +# 环编号系统 + +本项目只采用一套 canonical numbering。它是确定性的,不依赖 `clockwise` / `anticlockwise` 参数,也不是视觉上的方向切换。 + +## 规则 + +- `1 = 内酯羰基碳` +- `2 = 相邻酯氧` +- `3..N = 从 2 位出发沿环唯一图遍历顺序继续编号` + +这条规则在代码、文档、测试和验证输出中都保持一致。 + +## 16 元环镜像映射 + +当你需要把代码里的 canonical numbering 转成文献里常见的反向标注时,使用: + +```text +p_mirror = ring_size - p + 3 +``` + +对 16 元环,这会得到: + +- `3 → 16` +- `4 → 15` +- `5 → 14` +- `6 → 13` +- `7 → 12` +- `8 → 11` +- `9 → 10` +- `10 → 9` +- `11 → 8` +- `12 → 7` +- `13 → 6` +- `14 → 5` +- `15 → 4` +- `16 → 3` + +常见对照点: + +- `6 → 13` +- `7 → 12` +- `15 → 4` +- `16 → 3` + +## 使用建议 + +- 先在代码和数据库里保留 canonical numbering。 +- 只在图注、论文、汇报里按需要加镜像标签。 +- 不要把方向差异实现成 API 参数。 +- 如果分子是 bridge / fused multi-anchor,先把结构语义说明清楚,再讨论编号可视化。 + +## 最短结论 + +如果你在看代码,记 canonical numbering。 +如果你在对照文献,记 `p_mirror = ring_size - p + 3`。 diff --git a/mkdocs.yml b/mkdocs.yml index c81b50a..7463ff2 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -130,49 +130,25 @@ nav: - 首页: - index.md - 快速开始: getting-started.md - - 安装指南: installation.md - 用户指南: - user-guide/index.md - - MacrolactoneFragmenter 使用: user-guide/fragmenter-usage.md - 环编号系统: user-guide/ring-numbering.md - - 可视化功能: user-guide/visualization.md - - 批量处理: user-guide/batch-processing.md - - 数据导出: user-guide/data-export.md - - - 教程与示例: - - tutorials/index.md - - 基础教程: tutorials/basic-tutorial.md - - 环数识别教程: tutorials/using-macro-lactone-analyzer.md - - 高级用法: tutorials/advanced-usage.md - - 使用案例: tutorials/use-cases.md - - - API 参考: - - api/index.md - - MacroLactoneAnalyzer: api/macro-lactone-analyzer.md - - MacrolactoneFragmenter: api/macrolactone-fragmenter.md - - Fragment 数据类: api/fragment-dataclass.md - - 环编号模块: api/ring-numbering.md - - 可视化模块: api/ring-visualization.md - - 工具函数: api/utilities.md - 开发者指南: - development/index.md - - 贡献指南: development/contributing.md - 项目结构: development/project-structure.md - - 测试: development/testing.md - 项目文档: - project-docs/AGENTS.md - - 实现总结: project-docs/IMPLEMENTATION_SUMMARY.md - - 清理总结: project-docs/CLEANUP_SUMMARY.md - - 文档指南: project-docs/DOCUMENTATION_GUIDE.md - - 快速命令: project-docs/QUICK_COMMANDS.md - - 关于: - - about/index.md - - 更新日志: about/changelog.md - - 许可证: about/license.md +not_in_nav: | + /SUMMARY.md + /plans/ + /project-docs/CLEANUP_SUMMARY.md + /project-docs/DOCUMENTATION_GUIDE.md + /project-docs/IMPLEMENTATION_SUMMARY.md + /project-docs/QUICK_COMMANDS.md # Extra extra: @@ -201,4 +177,3 @@ extra: note: >- Thanks for your feedback! Help us improve this page by creating an issue. - diff --git a/pixi.toml b/pixi.toml index 3fcc455..9f2edf8 100644 --- a/pixi.toml +++ b/pixi.toml @@ -24,3 +24,7 @@ sqlmodel = ">=0.0.37,<0.0.38" [pypi-dependencies] macro_lactone_toolkit = { path = ".", editable = true } +mkdocs = ">=1.6,<2" +mkdocs-material = ">=9.6,<10" +mkdocstrings = ">=0.28,<0.29" +mkdocstrings-python = ">=1.16,<2" diff --git a/scripts/analyze_validation_fragment_library.py b/scripts/analyze_validation_fragment_library.py index 4930613..98de199 100644 --- a/scripts/analyze_validation_fragment_library.py +++ b/scripts/analyze_validation_fragment_library.py @@ -438,6 +438,16 @@ def format_top_positions(table: pd.DataFrame, sort_column: str, limit: int = 5) return subset.to_string(index=False) +def mirror_ring_position(position: int, ring_size: int) -> int: + if position <= 2: + return position + return ring_size - position + 3 + + +def format_position_mapping(positions: list[int], ring_size: int) -> str: + return ", ".join(f"{position} → {mirror_ring_position(position, ring_size)}" for position in positions) + + def build_markdown_report( output_dir: Path, analysis_df: pd.DataFrame, @@ -559,6 +569,15 @@ def build_markdown_report( "- Position 15: supported as a **frequent modification site**, but the retained chemotypes are concentrated into a small number of acyl substituents.", "- Position 16: not prevalent in the current database, but the few retained fragments are structurally distinct singletons; this makes it a **low-evidence exploratory site**, not a high-confidence natural hotspot.", "", + "## Numbering Alignment With Medicinal-Chemistry Labels", + "", + "- The codebase uses one canonical numbering rule: position 1 is the lactone carbonyl carbon, position 2 is the ester oxygen, and positions 3..N follow the unique ring traversal that starts from position 2 in `build_numbering_result()`.", + "- If a medicinal-chemistry scheme keeps positions 1 and 2 fixed but numbers the rest of the ring in the mirrored direction, then the conversion for positions >=3 is `p_mirror = ring_size - p + 3`.", + f"- For a {ring_size}-membered ring, literature labels `{','.join(str(position) for position in hotspot_table['cleavage_position'].tolist())}` map to current-code labels `{format_position_mapping(hotspot_table['cleavage_position'].tolist(), ring_size)}`.", + f"- Conversely, the current-code natural-diversity hotspots `13, 3, 4, 12` correspond to mirrored medicinal-chemistry labels `6, 16, 15, 7` in a {ring_size}-membered ring.", + "- This means the apparent disagreement was a numbering-direction mismatch, not a chemical contradiction between the database analysis and the literature-guided hotspot list.", + "- Practical rule: keep the database and cleavage-position statistics in the current canonical code numbering, but add mirrored medicinal-chemistry labels in figures, tables and manuscripts whenever you compare against literature.", + "", "## Figure 6. Are the Top Positions Driven by Ring-Bearing Side Chains?", "", f"![Ring {ring_size} ring sensitivity](ring{ring_size}_position_ring_sensitivity.png)", @@ -616,6 +635,7 @@ def build_markdown_report( "### Recommended paper-safe wording", "", f"> In the validated MacrolactoneDB fragment library, natural side-chain diversity of {ring_size}-membered macrolactones is concentrated primarily at positions 13, 3/4 and 12. After excluding fragments with <=3 heavy atoms to focus on design-relevant substituents, position 6 remains strongly diversity-enriched and position 15 remains frequency-enriched, whereas positions 7 and 16 are sparse and should be interpreted as literature-guided derivatization sites rather than statistically dominant natural hotspots.", + f"> If medicinal-chemistry labels are reported in the mirrored direction, those natural-diversity hotspots correspond to literature labels 6, 16, 15 and 7, while literature hotspot labels 6, 7, 15 and 16 correspond to current-code positions 13, 12, 4 and 3.", "", "### Practical interpretation for fragment-library design", "", @@ -624,6 +644,7 @@ def build_markdown_report( f"- For 16-membered macrolide design, prioritize positions **13, 3, 4, 12 and 6** for natural-diversity-driven fragment mining.", "- Keep positions **15** as a targeted acyl-modification site even though its chemotype diversity is narrower.", "- Treat positions **7 and 16** as hypothesis-driven medicinal chemistry positions that need literature or synthesis justification beyond database prevalence.", + "- When comparing to literature numbering, either rerun the hotspot panel with mirrored positions or label every reported position as `code_position (medchem_position)` to avoid directional ambiguity.", "", ] return "\n".join(lines) @@ -698,6 +719,15 @@ def build_markdown_report_zh( "- 其中 6 位最能支持你的药化判断:它不是最高频位点,但在设计相关大侧链中显示出很高的结构多样性。", "- 15 位则更偏向高频但低多样性的酰基修饰位点。", "", + "## 编号校准说明(代码编号 vs 药化编号)", + "", + "- 当前代码和数据库采用统一编号:`1 = 内酯羰基碳`,`2 = 相邻酯氧`,`3..N` 则从 2 位出发沿环的唯一遍历顺序继续编号。", + "- 如果药化文献同样固定 1 和 2 位,但把 `3..N` 按相反方向编号,则对于 `p >= 3` 有镜像换算公式:`p_镜像 = ring_size - p + 3`。", + f"- 对于 {ring_size} 元环,你关心的药化位点 `{','.join(str(position) for position in hotspot_table['cleavage_position'].tolist())}`,在当前代码编号下对应为:`{format_position_mapping(hotspot_table['cleavage_position'].tolist(), ring_size)}`。", + f"- 反过来,当前代码编号下的天然多样性热点 `13、3、4、12`,在药化镜像编号下分别对应 `6、16、15、7`。", + "- 因此,之前看起来对不上的 `13、3、4、12` 与 `6、7、15、16`,本质上是同一组位点的方向镜像,不是化学结论冲突。", + "- 建议后续统一规则:数据库、断裂结果、拼接和模型训练一律使用当前代码编号;论文、图表和药化讨论中若需对照文献,再同时标注镜像药化编号。", + "", "## 桥环 / 稠环干扰的敏感性分析", "", "桥连或双锚点侧链不会进入当前片段库,因为断裂逻辑只保留与主环存在 **1 个连接点** 的侧链组件。也就是说,真正的 bridge / fused multi-anchor components 已被代码层面排除。", @@ -756,9 +786,10 @@ def build_markdown_report_zh( "", "## 建议的论文表述方式", "", - "- 若讨论天然产物中的侧链多样性,可写为:`16 元大环内酯的天然侧链多样性主要集中在 13、3/4 和 12 位,并在 6 位保留较强的设计相关多样性。`", - "- 若讨论药化半合成改造热点,可写为:`6、7、15、16 位代表文献和先导化合物研究中优先使用的衍生化位点,其中 6 和 15 位在数据库统计中分别对应高多样性和高频率信号,而 7 和 16 位更多体现为文献指导的探索性位点。`", + "- 若讨论天然产物中的侧链多样性,可写为:`按当前代码编号,16 元大环内酯的天然侧链多样性主要集中在 13、3/4 和 12 位,并在 6 位保留较强的设计相关多样性;若换成药化镜像编号,则对应为 6、16/15 和 7 位。`", + "- 若讨论药化半合成改造热点,可写为:`按药化镜像编号,6、7、15、16 位代表文献和先导化合物研究中优先使用的衍生化位点;在当前代码编号下,它们对应 13、12、4、3 位。`", "- 若专门讨论非环状侧链设计,则应强调:`在排除 <=3 重原子小片段并进一步排除带环侧链后,15 位是最主要的非环状侧链修饰位点。`", + "- 若在图表中同时展示两套体系,建议统一写成:`代码编号 13(药化 6)` 这类双标签格式,而不要在同一表中混用单独编号。", "", "## 相关图表", "", diff --git a/src/macro_lactone_toolkit/__init__.py b/src/macro_lactone_toolkit/__init__.py index 734c93c..2d9765f 100644 --- a/src/macro_lactone_toolkit/__init__.py +++ b/src/macro_lactone_toolkit/__init__.py @@ -13,6 +13,11 @@ from .models import ( RingNumberingResult, SideChainFragment, ) +from .numbering import ( + mirror_macrolactone_position, + mirror_macrolactone_positions, + number_macrolactone, +) from .visualization import ( fragment_svg, numbered_molecule_svg, @@ -38,6 +43,9 @@ __all__ = [ "MacrolactoneFragmenter", "MacrolactoneValidator", "MacrocycleClassificationResult", + "mirror_macrolactone_position", + "mirror_macrolactone_positions", + "number_macrolactone", "numbered_molecule_svg", "ParentMolecule", "RingNumberingError", diff --git a/src/macro_lactone_toolkit/fragmenter.py b/src/macro_lactone_toolkit/fragmenter.py index de08d58..1da3a2c 100644 --- a/src/macro_lactone_toolkit/fragmenter.py +++ b/src/macro_lactone_toolkit/fragmenter.py @@ -14,6 +14,7 @@ from ._core import ( from .analyzer import MacroLactoneAnalyzer from .errors import AmbiguousMacrolactoneError, FragmentationError, MacrolactoneDetectionError from .models import FragmentationResult, RingNumberingResult, SideChainFragment +from .numbering import number_macrolactone class MacrolactoneFragmenter: @@ -26,9 +27,7 @@ class MacrolactoneFragmenter: self.analyzer = MacroLactoneAnalyzer() def number_molecule(self, mol_input: str | Chem.Mol) -> RingNumberingResult: - mol, _ = ensure_mol(mol_input) - candidate = self._select_candidate(mol) - return build_numbering_result(mol, candidate) + return number_macrolactone(mol_input, ring_size=self.ring_size) def fragment_molecule( self, diff --git a/src/macro_lactone_toolkit/numbering.py b/src/macro_lactone_toolkit/numbering.py new file mode 100644 index 0000000..254034e --- /dev/null +++ b/src/macro_lactone_toolkit/numbering.py @@ -0,0 +1,70 @@ +from __future__ import annotations + +from collections.abc import Iterable + +from rdkit import Chem + +from ._core import build_numbering_result, classify_macrolactone, ensure_mol, find_macrolactone_candidates +from .errors import AmbiguousMacrolactoneError, MacrolactoneDetectionError +from .models import RingNumberingResult + + +def number_macrolactone( + mol_input: str | Chem.Mol, + ring_size: int | None = None, +) -> RingNumberingResult: + """ + Return the canonical ring numbering for a supported macrolactone. + + Canonical numbering is fixed across the project: + 1 = lactone carbonyl carbon, 2 = ester oxygen, and 3..N continue by the + unique graph traversal that starts from position 2. This API does not + expose clockwise/anticlockwise options; mirrored medicinal-chemistry labels + should be handled through the mirror helpers in this module. + """ + + mol, smiles = ensure_mol(mol_input) + classification = classify_macrolactone(mol, smiles=smiles, ring_size=ring_size) + if classification.classification != "standard_macrolactone": + raise MacrolactoneDetectionError( + "Macrolactone rejected: " + f"classification={classification.classification} " + f"primary_reason_code={classification.primary_reason_code}" + ) + + candidates = find_macrolactone_candidates(mol, ring_size=ring_size) + valid_ring_sizes = sorted({candidate.ring_size for candidate in candidates}) + if len(candidates) > 1 or len(valid_ring_sizes) > 1: + raise AmbiguousMacrolactoneError( + "Multiple valid macrolactone candidates were detected. Pass ring_size explicitly." + ) + return build_numbering_result(mol, candidates[0]) + + +def mirror_macrolactone_position(position: int, ring_size: int) -> int: + """ + Convert a canonical ring position to its mirrored medicinal-chemistry label. + + Positions 1 and 2 are invariant. For positions >= 3, the mirrored label is + computed as `ring_size - position + 3`. + """ + + if position <= 2: + return position + return ring_size - position + 3 + + +def mirror_macrolactone_positions( + positions: Iterable[int], + ring_size: int, +) -> dict[int, int]: + """ + Convert multiple canonical positions to mirrored medicinal-chemistry labels. + + The input order is preserved in the returned mapping. + """ + + return { + int(position): mirror_macrolactone_position(int(position), ring_size) + for position in positions + } diff --git a/src/macro_lactone_toolkit/splicing/scaffold_prep.py b/src/macro_lactone_toolkit/splicing/scaffold_prep.py index 2367fcd..3d5bb55 100644 --- a/src/macro_lactone_toolkit/splicing/scaffold_prep.py +++ b/src/macro_lactone_toolkit/splicing/scaffold_prep.py @@ -5,7 +5,7 @@ from typing import Iterable from rdkit import Chem from .._core import collect_fragmentable_side_chain_atoms, ensure_mol, find_macrolactone_candidates, is_intrinsic_lactone_neighbor -from ..fragmenter import MacrolactoneFragmenter +from ..numbering import number_macrolactone def prepare_macrolactone_scaffold( @@ -15,8 +15,7 @@ def prepare_macrolactone_scaffold( ) -> tuple[Chem.Mol, dict[int, int]]: positions = list(positions) mol, _ = ensure_mol(mol_input) - fragmenter = MacrolactoneFragmenter(ring_size=ring_size) - numbering = fragmenter.number_molecule(mol) + numbering = number_macrolactone(mol, ring_size=ring_size) candidate = find_macrolactone_candidates(mol, ring_size=numbering.ring_size)[0] ring_atom_set = set(numbering.ordered_atoms) diff --git a/src/macro_lactone_toolkit/validation/validator.py b/src/macro_lactone_toolkit/validation/validator.py index be956ed..da595cd 100644 --- a/src/macro_lactone_toolkit/validation/validator.py +++ b/src/macro_lactone_toolkit/validation/validator.py @@ -11,11 +11,11 @@ from sqlmodel import select from macro_lactone_toolkit import MacroLactoneAnalyzer from macro_lactone_toolkit._core import ( - build_numbering_result, collect_fragmentable_side_chain_atoms, find_macrolactone_candidates, is_intrinsic_lactone_neighbor, ) +from macro_lactone_toolkit.numbering import number_macrolactone from macro_lactone_toolkit.validation.database import get_engine, get_session, init_database from macro_lactone_toolkit.validation.isotope_utils import build_fragment_with_isotope from macro_lactone_toolkit.validation.models import ( @@ -163,7 +163,7 @@ class MacrolactoneValidator: candidate = candidates[0] # Get numbering - numbering = build_numbering_result(mol, candidate) + numbering = number_macrolactone(mol, ring_size=parent.ring_size) # Save numbering to database numbering_record = RingNumbering( diff --git a/tests/test_documentation_entrypoints.py b/tests/test_documentation_entrypoints.py new file mode 100644 index 0000000..d0794eb --- /dev/null +++ b/tests/test_documentation_entrypoints.py @@ -0,0 +1,34 @@ +from __future__ import annotations + +from pathlib import Path + + +PROJECT_ROOT = Path(__file__).resolve().parents[1] + + +def test_root_readme_documents_canonical_numbering() -> None: + readme = (PROJECT_ROOT / "README.md").read_text(encoding="utf-8") + + assert "1 = 内酯羰基碳" in readme + assert "2 = 相邻酯氧" in readme + assert "3..N = 从 2 位出发沿环唯一图遍历顺序继续编号" in readme + assert "6 → 13" in readme + assert "7 → 12" in readme + + +def test_root_agents_exists_and_documents_numbering_invariants() -> None: + agents_path = PROJECT_ROOT / "AGENTS.md" + + assert agents_path.exists() + agents = agents_path.read_text(encoding="utf-8") + assert "canonical numbering" in agents + assert "不是视觉顺时针" in agents + assert "bridge / fused multi-anchor" in agents + + +def test_mkdocs_ring_numbering_page_documents_mirror_mapping() -> None: + ring_doc = (PROJECT_ROOT / "docs" / "user-guide" / "ring-numbering.md").read_text(encoding="utf-8") + + assert "p_mirror = ring_size - p + 3" in ring_doc + assert "6 → 13" in ring_doc + assert "15 → 4" in ring_doc diff --git a/tests/test_numbering_api.py b/tests/test_numbering_api.py new file mode 100644 index 0000000..a697558 --- /dev/null +++ b/tests/test_numbering_api.py @@ -0,0 +1,52 @@ +from __future__ import annotations + +from macro_lactone_toolkit import ( + MacrolactoneFragmenter, + mirror_macrolactone_position, + mirror_macrolactone_positions, + number_macrolactone, +) +from macro_lactone_toolkit.splicing.scaffold_prep import prepare_macrolactone_scaffold + +from .helpers import build_macrolactone + + +def test_number_macrolactone_matches_fragmenter_numbering() -> None: + built = build_macrolactone(16, {5: "methyl"}) + + api_result = number_macrolactone(built.smiles, ring_size=16) + fragmenter_result = MacrolactoneFragmenter(ring_size=16).number_molecule(built.smiles) + + assert api_result.position_to_atom == fragmenter_result.position_to_atom + assert api_result.atom_to_position == fragmenter_result.atom_to_position + + +def test_mirror_macrolactone_position_for_ring16() -> None: + assert mirror_macrolactone_position(6, 16) == 13 + assert mirror_macrolactone_position(7, 16) == 12 + assert mirror_macrolactone_position(15, 16) == 4 + assert mirror_macrolactone_position(16, 16) == 3 + + +def test_mirror_macrolactone_positions_returns_stable_mapping() -> None: + assert mirror_macrolactone_positions([6, 7, 15, 16], 16) == { + 6: 13, + 7: 12, + 15: 4, + 16: 3, + } + + +def test_prepare_scaffold_keeps_requested_position_label() -> None: + built = build_macrolactone(16, {5: "ethyl"}) + + scaffold, dummy_map = prepare_macrolactone_scaffold( + built.mol, + positions=[5], + ring_size=16, + ) + + numbering = number_macrolactone(built.mol, ring_size=16) + assert 5 in dummy_map + assert numbering.position_to_atom[5] == built.position_to_atom[5] + assert numbering.position_to_atom[5] == scaffold.GetAtomWithIdx(dummy_map[5]).GetNeighbors()[0].GetIdx() diff --git a/tests/test_scripts_and_docs.py b/tests/test_scripts_and_docs.py index 59e57f7..0336882 100644 --- a/tests/test_scripts_and_docs.py +++ b/tests/test_scripts_and_docs.py @@ -285,6 +285,10 @@ def test_analyze_validation_fragment_library_script_generates_reports(tmp_path): report_zh = (output_dir / "fragment_library_analysis_report_zh.md").read_text(encoding="utf-8") assert "桥连或双锚点侧链不会进入当前片段库" in report_zh assert "cyclic single-anchor side chains" in report_zh + assert "6 → 13" in report_zh + assert "7 → 12" in report_zh + assert "15 → 4" in report_zh + assert "16 → 3" in report_zh def test_active_text_assets_do_not_reference_legacy_api(): diff --git a/validation_output/fragment_library_analysis/fragment_library_analysis_report.md b/validation_output/fragment_library_analysis/fragment_library_analysis_report.md index 07dbf79..23b2ca5 100644 --- a/validation_output/fragment_library_analysis/fragment_library_analysis_report.md +++ b/validation_output/fragment_library_analysis/fragment_library_analysis_report.md @@ -84,6 +84,15 @@ This panel focuses on positions 6, 7, 15 and 16 because these are the literature - Position 15: supported as a **frequent modification site**, but the retained chemotypes are concentrated into a small number of acyl substituents. - Position 16: not prevalent in the current database, but the few retained fragments are structurally distinct singletons; this makes it a **low-evidence exploratory site**, not a high-confidence natural hotspot. +## Numbering Alignment With Medicinal-Chemistry Labels + +- The codebase uses one canonical numbering rule: position 1 is the lactone carbonyl carbon, position 2 is the ester oxygen, and positions 3..N follow the unique ring traversal that starts from position 2 in `build_numbering_result()`. +- If a medicinal-chemistry scheme keeps positions 1 and 2 fixed but numbers the rest of the ring in the mirrored direction, then the conversion for positions >=3 is `p_mirror = ring_size - p + 3`. +- For a 16-membered ring, literature labels `6,7,15,16` map to current-code labels `6 → 13, 7 → 12, 15 → 4, 16 → 3`. +- Conversely, the current-code natural-diversity hotspots `13, 3, 4, 12` correspond to mirrored medicinal-chemistry labels `6, 16, 15, 7` in a 16-membered ring. +- This means the apparent disagreement was a numbering-direction mismatch, not a chemical contradiction between the database analysis and the literature-guided hotspot list. +- Practical rule: keep the database and cleavage-position statistics in the current canonical code numbering, but add mirrored medicinal-chemistry labels in figures, tables and manuscripts whenever you compare against literature. + ## Figure 6. Are the Top Positions Driven by Ring-Bearing Side Chains? ![Ring 16 ring sensitivity](ring16_position_ring_sensitivity.png) @@ -140,6 +149,7 @@ The earlier statement that `6,7,15,16` are important 16-membered macrolide modif ### Recommended paper-safe wording > In the validated MacrolactoneDB fragment library, natural side-chain diversity of 16-membered macrolactones is concentrated primarily at positions 13, 3/4 and 12. After excluding fragments with <=3 heavy atoms to focus on design-relevant substituents, position 6 remains strongly diversity-enriched and position 15 remains frequency-enriched, whereas positions 7 and 16 are sparse and should be interpreted as literature-guided derivatization sites rather than statistically dominant natural hotspots. +> If medicinal-chemistry labels are reported in the mirrored direction, those natural-diversity hotspots correspond to literature labels 6, 16, 15 and 7, while literature hotspot labels 6, 7, 15 and 16 correspond to current-code positions 13, 12, 4 and 3. ### Practical interpretation for fragment-library design @@ -148,4 +158,5 @@ The earlier statement that `6,7,15,16` are important 16-membered macrolide modif - For 16-membered macrolide design, prioritize positions **13, 3, 4, 12 and 6** for natural-diversity-driven fragment mining. - Keep positions **15** as a targeted acyl-modification site even though its chemotype diversity is narrower. - Treat positions **7 and 16** as hypothesis-driven medicinal chemistry positions that need literature or synthesis justification beyond database prevalence. +- When comparing to literature numbering, either rerun the hotspot panel with mirrored positions or label every reported position as `code_position (medchem_position)` to avoid directional ambiguity. diff --git a/validation_output/fragment_library_analysis/fragment_library_analysis_report_zh.md b/validation_output/fragment_library_analysis/fragment_library_analysis_report_zh.md index 9775fdb..9c4e383 100644 --- a/validation_output/fragment_library_analysis/fragment_library_analysis_report_zh.md +++ b/validation_output/fragment_library_analysis/fragment_library_analysis_report_zh.md @@ -30,6 +30,15 @@ - 其中 6 位最能支持你的药化判断:它不是最高频位点,但在设计相关大侧链中显示出很高的结构多样性。 - 15 位则更偏向高频但低多样性的酰基修饰位点。 +## 编号校准说明(代码编号 vs 药化编号) + +- 当前代码和数据库采用统一编号:`1 = 内酯羰基碳`,`2 = 相邻酯氧`,`3..N` 则从 2 位出发沿环的唯一遍历顺序继续编号。 +- 如果药化文献同样固定 1 和 2 位,但把 `3..N` 按相反方向编号,则对于 `p >= 3` 有镜像换算公式:`p_镜像 = ring_size - p + 3`。 +- 对于 16 元环,你关心的药化位点 `6,7,15,16`,在当前代码编号下对应为:`6 → 13, 7 → 12, 15 → 4, 16 → 3`。 +- 反过来,当前代码编号下的天然多样性热点 `13、3、4、12`,在药化镜像编号下分别对应 `6、16、15、7`。 +- 因此,之前看起来对不上的 `13、3、4、12` 与 `6、7、15、16`,本质上是同一组位点的方向镜像,不是化学结论冲突。 +- 建议后续统一规则:数据库、断裂结果、拼接和模型训练一律使用当前代码编号;论文、图表和药化讨论中若需对照文献,再同时标注镜像药化编号。 + ## 桥环 / 稠环干扰的敏感性分析 桥连或双锚点侧链不会进入当前片段库,因为断裂逻辑只保留与主环存在 **1 个连接点** 的侧链组件。也就是说,真正的 bridge / fused multi-anchor components 已被代码层面排除。 @@ -88,9 +97,10 @@ ## 建议的论文表述方式 -- 若讨论天然产物中的侧链多样性,可写为:`16 元大环内酯的天然侧链多样性主要集中在 13、3/4 和 12 位,并在 6 位保留较强的设计相关多样性。` -- 若讨论药化半合成改造热点,可写为:`6、7、15、16 位代表文献和先导化合物研究中优先使用的衍生化位点,其中 6 和 15 位在数据库统计中分别对应高多样性和高频率信号,而 7 和 16 位更多体现为文献指导的探索性位点。` +- 若讨论天然产物中的侧链多样性,可写为:`按当前代码编号,16 元大环内酯的天然侧链多样性主要集中在 13、3/4 和 12 位,并在 6 位保留较强的设计相关多样性;若换成药化镜像编号,则对应为 6、16/15 和 7 位。` +- 若讨论药化半合成改造热点,可写为:`按药化镜像编号,6、7、15、16 位代表文献和先导化合物研究中优先使用的衍生化位点;在当前代码编号下,它们对应 13、12、4、3 位。` - 若专门讨论非环状侧链设计,则应强调:`在排除 <=3 重原子小片段并进一步排除带环侧链后,15 位是最主要的非环状侧链修饰位点。` +- 若在图表中同时展示两套体系,建议统一写成:`代码编号 13(药化 6)` 这类双标签格式,而不要在同一表中混用单独编号。 ## 相关图表