755 lines
45 KiB
Plaintext
755 lines
45 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# 筛选芳香胺候选药物 - Sandmeyer反应起始物分析\n",
|
||
"\n",
|
||
"## 背景介绍\n",
|
||
"\n",
|
||
"### Sandmeyer反应回顾\n",
|
||
"Sandmeyer反应是经典的芳香胺转化方法:\n",
|
||
"**Ar-NH₂ → [Ar-N₂]⁺ → Ar-X**\n",
|
||
"其中 X = Cl, Br, I, CN, OH, SCN 等\n",
|
||
"\n",
|
||
"### 筛选目标\n",
|
||
"通过识别药物分子中含有芳香胺结构(Ar-NH₂)的化合物,\n",
|
||
"找出可能作为Sandmeyer反应起始物的候选药物。\n",
|
||
"这些分子可能原本通过Sandmeyer反应引入芳香卤素,\n",
|
||
"现在可以用张夏恒反应进行更高效的转化。\n",
|
||
"\n",
|
||
"### SMARTS模式\n",
|
||
"使用SMARTS模式 `[c,n][NH2]` 匹配:\n",
|
||
"- `[c,n]`: 芳香碳或氮原子\n",
|
||
"- `[NH2]`: 氨基(-NH₂)\n",
|
||
"\n",
|
||
"**重要提醒:**\n",
|
||
"- 此筛选基于分子结构特征\n",
|
||
"- 最终需要查阅文献确认合成路线\n",
|
||
"- 并非所有含芳香胺的药物都使用Sandmeyer反应"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 导入所需库"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import os\n",
|
||
"from pathlib import Path\n",
|
||
"from rdkit import Chem\n",
|
||
"from rdkit.Chem import PandasTools, Draw\n",
|
||
"from rdkit.Chem.Draw import rdMolDraw2D\n",
|
||
"from IPython.display import SVG, display\n",
|
||
"from rdkit.Chem import AllChem\n",
|
||
"import pandas as pd\n",
|
||
"import warnings\n",
|
||
"warnings.filterwarnings('ignore')\n",
|
||
"\n",
|
||
"# 设置显示选项\n",
|
||
"pd.set_option('display.max_columns', None)\n",
|
||
"pd.set_option('display.max_colwidth', 100)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 定义筛选模式和可视化函数\n",
|
||
"\n",
|
||
"### SMARTS模式设置\n",
|
||
"- **目标模式**: `[c,n][NH2]` - 芳香碳/氮原子连接的氨基\n",
|
||
"- **匹配逻辑**: 寻找所有包含此子结构的分子"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"使用SMARTS模式: [c,n][NH2]\n",
|
||
"模式验证: ✓\n",
|
||
"\n",
|
||
"创建目录:../data/drug_targetmol/aniline_candidates\n",
|
||
"创建可视化目录:../data/drug_targetmol/aniline_candidates/visualizations\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# 定义筛选模式\n",
|
||
"TARGET_SMARTS = '[c,n][NH2]'\n",
|
||
"pattern = Chem.MolFromSmarts(TARGET_SMARTS)\n",
|
||
"\n",
|
||
"if pattern is None:\n",
|
||
" raise ValueError(f\"无效的SMARTS模式: {TARGET_SMARTS}\")\n",
|
||
"\n",
|
||
"print(f\"使用SMARTS模式: {TARGET_SMARTS}\")\n",
|
||
"print(f\"模式验证: {'✓' if pattern else '✗'}\")\n",
|
||
"\n",
|
||
"# 创建输出目录\n",
|
||
"output_base = Path(\"../data/drug_targetmol\")\n",
|
||
"output_dir = output_base / \"aniline_candidates\"\n",
|
||
"visualization_dir = output_dir / \"visualizations\"\n",
|
||
"\n",
|
||
"output_dir.mkdir(exist_ok=True)\n",
|
||
"visualization_dir.mkdir(exist_ok=True)\n",
|
||
"\n",
|
||
"print(f\"\\n创建目录:{output_dir}\")\n",
|
||
"print(f\"创建可视化目录:{visualization_dir}\")\n",
|
||
"\n",
|
||
"def generate_highlighted_svg(mol, highlight_atoms, filename, title=\"\"):\n",
|
||
" \"\"\"生成高亮匹配结构的高清晰度SVG图片\"\"\"\n",
|
||
" # 计算2D坐标\n",
|
||
" AllChem.Compute2DCoords(mol)\n",
|
||
" \n",
|
||
" # 创建SVG绘制器\n",
|
||
" drawer = rdMolDraw2D.MolDraw2DSVG(1200, 900) # 更大的尺寸以提高清晰度\n",
|
||
" drawer.SetFontSize(12)\n",
|
||
" \n",
|
||
" # 绘制选项\n",
|
||
" draw_options = drawer.drawOptions()\n",
|
||
" draw_options.addAtomIndices = False # 不显示原子索引,保持简洁\n",
|
||
" draw_options.addBondIndices = False\n",
|
||
" draw_options.addStereoAnnotation = True\n",
|
||
" draw_options.fixedFontSize = 12\n",
|
||
" \n",
|
||
" # 高亮匹配的原子(蓝色)\n",
|
||
" atom_colors = {}\n",
|
||
" for atom_idx in highlight_atoms:\n",
|
||
" atom_colors[atom_idx] = (0.3, 0.3, 1.0) # 蓝色高亮\n",
|
||
" \n",
|
||
" # 绘制分子\n",
|
||
" drawer.DrawMolecule(mol, \n",
|
||
" highlightAtoms=highlight_atoms,\n",
|
||
" highlightAtomColors=atom_colors)\n",
|
||
" \n",
|
||
" drawer.FinishDrawing()\n",
|
||
" svg_content = drawer.GetDrawingText()\n",
|
||
" \n",
|
||
" # 添加标题\n",
|
||
" if title:\n",
|
||
" # 在SVG中添加标题\n",
|
||
" svg_lines = svg_content.split(\"\\\\n\")\n",
|
||
" # 在<g>标签前插入标题\n",
|
||
" for i, line in enumerate(svg_lines):\n",
|
||
" if \"<g \" in line and \"transform\" in line:\n",
|
||
" svg_lines.insert(i, f\"<text x=\\\"50%\\\" y=\\\"30\\\" text-anchor=\\\"middle\\\" font-size=\\\"16\\\" font-weight=\\\"bold\\\">{title}</text>\")\n",
|
||
" break\n",
|
||
" svg_with_title = \"\\\\n\".join(svg_lines)\n",
|
||
" else:\n",
|
||
" svg_with_title = svg_content\n",
|
||
" \n",
|
||
" # 保存文件\n",
|
||
" with open(filename, \"w\") as f:\n",
|
||
" f.write(svg_with_title)\n",
|
||
" \n",
|
||
" return svg_content"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 数据加载和分子筛选\n",
|
||
"\n",
|
||
"### 数据源\n",
|
||
"- 文件位置:`data/drug_targetmol/0c04ffc9fe8c2ec916412fbdc2a49bf4.sdf`\n",
|
||
"- 包含药物分子结构和丰富属性信息\n",
|
||
"\n",
|
||
"### 筛选逻辑\n",
|
||
"1. 读取SDF文件\n",
|
||
"2. 对每个分子进行SMARTS匹配\n",
|
||
"3. 记录匹配的原子和匹配数量\n",
|
||
"4. 保存匹配结果到CSV\n",
|
||
"5. 生成高亮可视化图片"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 3,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"正在读取SDF文件...\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"[21:24:23] Both bonds on one end of an atropisomer are on the same side - atoms is : 3\n",
|
||
"[21:24:23] Explicit valence for atom # 2 N greater than permitted\n",
|
||
"[21:24:23] ERROR: Could not sanitize molecule ending on line 217340\n",
|
||
"[21:24:23] ERROR: Explicit valence for atom # 2 N greater than permitted\n",
|
||
"[21:24:24] Explicit valence for atom # 4 N greater than permitted\n",
|
||
"[21:24:24] ERROR: Could not sanitize molecule ending on line 317283\n",
|
||
"[21:24:24] ERROR: Explicit valence for atom # 4 N greater than permitted\n",
|
||
"[21:24:24] Explicit valence for atom # 4 N greater than permitted\n",
|
||
"[21:24:24] ERROR: Could not sanitize molecule ending on line 324666\n",
|
||
"[21:24:24] ERROR: Explicit valence for atom # 4 N greater than permitted\n",
|
||
"[21:24:24] Explicit valence for atom # 5 N greater than permitted\n",
|
||
"[21:24:24] ERROR: Could not sanitize molecule ending on line 365883\n",
|
||
"[21:24:24] ERROR: Explicit valence for atom # 5 N greater than permitted\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"成功加载 3276 个分子\n",
|
||
"\n",
|
||
"数据概览:\n",
|
||
" Index Plate Row Col ID Name \\\n",
|
||
"0 1 L1010-1 a 2 Dexamethasone \n",
|
||
"1 2 L1010-1 a 3 Danicopan \n",
|
||
"2 3 L1010-1 a 4 Cyclosporin A \n",
|
||
"3 4 L1010-1 a 5 L-Carnitine \n",
|
||
"4 5 L1010-1 a 6 Trimetazidine dihydrochloride \n",
|
||
"\n",
|
||
" Synonyms CAS \\\n",
|
||
"0 MK 125;Prednisolone F;NSC 34521;Hexadecadrol 50-02-2 \n",
|
||
"1 ACH-4471 1903768-17-1 \n",
|
||
"2 Cyclosporine A;Ciclosporin;Cyclosporine 59865-13-3 \n",
|
||
"3 L(-)-Carnitine;Levocarnitine 541-15-1 \n",
|
||
"4 Yoshimilon;Kyurinett;Vastarel F 13171-25-0 \n",
|
||
"\n",
|
||
" SMILES \\\n",
|
||
"0 C[C@@H]1C[C@H]2[C@@H]3CCC4=CC(=O)C=C[C@]4(C)[C@@]3(F)[C@@H](O)C[C@]2(C)[C@@]1(O)C(=O)CO \n",
|
||
"1 CC(=O)c1nn(CC(=O)N2C[C@H](F)C[C@H]2C(=O)Nc2cccc(Br)n2)c2ccc(cc12)-c1cnc(C)nc1 \n",
|
||
"2 [C@H]([C@@H](C/C=C/C)C)(O)[C@@]1(N(C)C(=O)[C@H]([C@@H](C)C)N(C)C(=O)[C@H](CC(C)C)N(C)C(=O)[C@H](... \n",
|
||
"3 C[N+](C)(C)C[C@@H](O)CC([O-])=O \n",
|
||
"4 Cl.Cl.COC1=C(OC)C(OC)=C(CN2CCNCC2)C=C1 \n",
|
||
"\n",
|
||
" Formula MolWt Approved status \\\n",
|
||
"0 C22H29FO5 392.46 NMPA;EMA;FDA \n",
|
||
"1 C26H23BrFN7O3 580.41 FDA \n",
|
||
"2 C62H111N11O12 1202.61 FDA \n",
|
||
"3 C7H15NO3 161.2 FDA \n",
|
||
"4 C14H24Cl2N2O3 339.258 NMPA;EMA \n",
|
||
"\n",
|
||
" Pharmacopoeia \\\n",
|
||
"0 USP39-NF34;BP2015;JP16;IP2010 \n",
|
||
"1 NaN \n",
|
||
"2 Martindale the Extra Pharmacopoei, EP10.2, USP43-NF38, Ph.Int_6th, JP17 \n",
|
||
"3 NaN \n",
|
||
"4 BP2019;KP Ⅹ;EP9.2;IP2010;JP17;Martindale:The Extra Pharmacopoeia \n",
|
||
"\n",
|
||
" Disease \\\n",
|
||
"0 Metabolism \n",
|
||
"1 Others \n",
|
||
"2 Immune system \n",
|
||
"3 Cardiovascular system \n",
|
||
"4 Cardiovascular system \n",
|
||
"\n",
|
||
" Pathways \\\n",
|
||
"0 Antibody-drug Conjugate/ADC Related;Autophagy;Endocrinology/Hormones;Immunology/Inflammation;Mic... \n",
|
||
"1 Immunology/Inflammation \n",
|
||
"2 Immunology/Inflammation;Metabolism;Microbiology/Virology \n",
|
||
"3 Metabolism \n",
|
||
"4 Autophagy;Metabolism \n",
|
||
"\n",
|
||
" Target \\\n",
|
||
"0 Antibacterial;Antibiotic;Autophagy;Complement System;Glucocorticoid Receptor;IL Receptor;Mitopha... \n",
|
||
"1 Complement System \n",
|
||
"2 Phosphatase;Antibiotic;Complement System \n",
|
||
"3 Endogenous Metabolite;Fatty Acid Synthase \n",
|
||
"4 Autophagy;Fatty Acid Synthase \n",
|
||
"\n",
|
||
" Receptor \\\n",
|
||
"0 Antibiotic; Autophagy; Bacterial; Complement System; Glucocorticoid Receptor; IL receptor; Mitop... \n",
|
||
"1 Complement System; factor D \n",
|
||
"2 Antibiotic; calcineurin phosphatase; Complement System; Phosphatase \n",
|
||
"3 Endogenous Metabolite; FAS \n",
|
||
"4 Autophagy; mitochondrial long-chain 3-ketoacyl thiolase \n",
|
||
"\n",
|
||
" Bioactivity \\\n",
|
||
"0 Dexamethasone is a glucocorticoid receptor agonist and IL receptor modulator with anti-inflammat... \n",
|
||
"1 Danicopan (ACH-4471) (ACH-4471) is a selective, orally active small molecule factor D inhibitor ... \n",
|
||
"2 Cyclosporin A is a natural product and an active fungal metabolite, classified as a cyclic polyp... \n",
|
||
"3 L-Carnitine (L(-)-Carnitine) is an amino acid derivative. L-Carnitine facilitates long-chain fat... \n",
|
||
"4 Trimetazidine dihydrochloride (Vastarel F) can improve myocardial glucose utilization by inhibit... \n",
|
||
"\n",
|
||
" Reference \\\n",
|
||
"0 Li M, Yu H. Identification of WP1066, an inhibitor of JAK2 and STAT3, as a Kv1. 3 potassium chan... \n",
|
||
"1 Yuan X, et al. Small-molecule factor D inhibitors selectively block the alternative pathway of c... \n",
|
||
"2 D'Angelo G, et al. Cyclosporin A prevents the hypoxic adaptation by activating hypoxia-inducible... \n",
|
||
"3 Jogl G, Tong L. Cell. 2003 Jan 10; 112(1):113-22. \n",
|
||
"4 Yang Q, et al. Int J Clin Exp Pathol. 2015, 8(4):3735-3741.;Liu Z, et al. Metabolism. 2016, 65(3... \n",
|
||
"\n",
|
||
" ROMol \n",
|
||
"0 <rdkit.Chem.rdchem.Mol object at 0x77530d73c820> \n",
|
||
"1 <rdkit.Chem.rdchem.Mol object at 0x77530d73c890> \n",
|
||
"2 <rdkit.Chem.rdchem.Mol object at 0x77530a3f6f10> \n",
|
||
"3 <rdkit.Chem.rdchem.Mol object at 0x77530a3f70d0> \n",
|
||
"4 <rdkit.Chem.rdchem.Mol object at 0x77530a3f7140> \n",
|
||
"\n",
|
||
"列名:['Index', 'Plate', 'Row', 'Col', 'ID', 'Name', 'Synonyms', 'CAS', 'SMILES', 'Formula', 'MolWt', 'Approved status', 'Pharmacopoeia', 'Disease', 'Pathways', 'Target', 'Receptor', 'Bioactivity', 'Reference', 'ROMol']\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# 读取SDF文件\n",
|
||
"sdf_path = '../data/drug_targetmol/0c04ffc9fe8c2ec916412fbdc2a49bf4.sdf'\n",
|
||
"\n",
|
||
"print(\"正在读取SDF文件...\")\n",
|
||
"df = PandasTools.LoadSDF(sdf_path)\n",
|
||
"print(f\"成功加载 {len(df)} 个分子\")\n",
|
||
"\n",
|
||
"# 显示数据基本信息\n",
|
||
"print(\"\\n数据概览:\")\n",
|
||
"print(df.head())\n",
|
||
"print(f\"\\n列名:{list(df.columns)}\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 7,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"开始筛选芳香胺结构...\n",
|
||
"SMARTS模式: [c,n][N&H2]\n",
|
||
"找到 262 个匹配分子(处理了 3276 个分子)\n",
|
||
"\n",
|
||
"筛选结果摘要:\n",
|
||
" Name CAS Formula total_matches\n",
|
||
"17 Guanosine 118-00-3 C10H13N5O5 1\n",
|
||
"20 Ganciclovir 82410-32-0 C9H13N5O4 1\n",
|
||
"22 Imiquimod maleate 896106-16-4 C18H20N4O4 1\n",
|
||
"27 Brincidofovir 444805-28-1 C27H52N3O7P 1\n",
|
||
"28 Imiquimod 99011-02-6 C14H16N4 1\n",
|
||
"32 Ganciclovir sodium 107910-75-8 C9H13N5NaO4 1\n",
|
||
"33 Cytarabine 147-94-4 C9H13N3O5 1\n",
|
||
"35 Vidarabine 5536-17-4 C10H13N5O4 1\n",
|
||
"38 Penciclovir 39809-25-1 C10H15N5O3 1\n",
|
||
"41 Famciclovir 104227-87-4 C14H19N5O4 1\n",
|
||
"... 还有 252 个分子\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"def screen_molecules_for_aniline(df, smarts_pattern, max_molecules=100):\n",
|
||
" \"\"\"\n",
|
||
" 筛选包含芳香胺结构的分子\n",
|
||
" \n",
|
||
" Args:\n",
|
||
" df: 包含分子的DataFrame\n",
|
||
" smarts_pattern: RDKit SMARTS模式对象\n",
|
||
" max_molecules: 最大处理分子数量\n",
|
||
" \n",
|
||
" Returns:\n",
|
||
" 筛选结果DataFrame\n",
|
||
" \"\"\"\n",
|
||
" print(f\"开始筛选芳香胺结构...\")\n",
|
||
" print(f\"SMARTS模式: {Chem.MolToSmarts(smarts_pattern)}\")\n",
|
||
" \n",
|
||
" matched_molecules = []\n",
|
||
" processed_count = 0\n",
|
||
" \n",
|
||
" for idx, row in df.iterrows():\n",
|
||
" if processed_count >= max_molecules:\n",
|
||
" break\n",
|
||
" \n",
|
||
" mol = row['ROMol']\n",
|
||
" if mol is None:\n",
|
||
" continue\n",
|
||
" \n",
|
||
" processed_count += 1\n",
|
||
" \n",
|
||
" # 检查是否匹配SMARTS模式\n",
|
||
" if mol.HasSubstructMatch(smarts_pattern):\n",
|
||
" matches = mol.GetSubstructMatches(smarts_pattern)\n",
|
||
" \n",
|
||
" # 收集所有匹配的原子\n",
|
||
" matched_atoms = set()\n",
|
||
" for match in matches:\n",
|
||
" matched_atoms.update(match)\n",
|
||
" \n",
|
||
" # 创建匹配记录\n",
|
||
" match_record = row.copy()\n",
|
||
" match_record['matched_atoms'] = list(matched_atoms)\n",
|
||
" match_record['total_matches'] = len(matches)\n",
|
||
" match_record['smarts_pattern'] = Chem.MolToSmarts(smarts_pattern)\n",
|
||
" matched_molecules.append(match_record)\n",
|
||
" \n",
|
||
" result_df = pd.DataFrame(matched_molecules)\n",
|
||
" print(f\"找到 {len(result_df)} 个匹配分子(处理了 {processed_count} 个分子)\")\n",
|
||
" \n",
|
||
" return result_df\n",
|
||
"\n",
|
||
"# 执行筛选\n",
|
||
"matched_df = screen_molecules_for_aniline(df, pattern, max_molecules=1000000)\n",
|
||
"\n",
|
||
"# 显示结果摘要\n",
|
||
"if len(matched_df) > 0:\n",
|
||
" print(\"\\n筛选结果摘要:\")\n",
|
||
" summary_cols = ['Name', 'CAS', 'Formula', 'total_matches']\n",
|
||
" if len(matched_df) <= 10:\n",
|
||
" print(matched_df[summary_cols])\n",
|
||
" else:\n",
|
||
" print(matched_df[summary_cols].head(10))\n",
|
||
" print(f\"... 还有 {len(matched_df) - 10} 个分子\")\n",
|
||
"else:\n",
|
||
" print(\"\\n未找到匹配分子\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 保存筛选结果\n",
|
||
"\n",
|
||
"### 输出文件\n",
|
||
"1. **CSV文件**:包含所有匹配分子的属性信息和匹配详情\n",
|
||
"2. **SVG图片**:每个匹配分子的结构可视化,高亮芳香胺结构"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 9,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"CSV结果已保存到:../data/drug_targetmol/aniline_candidates/aniline_candidates.csv\n",
|
||
"包含 262 个分子,23 个属性列\n",
|
||
"\n",
|
||
"开始生成可视化图片(最多500个)...\n",
|
||
"已生成 10 个分子图片\n",
|
||
"已生成 20 个分子图片\n",
|
||
"已生成 30 个分子图片\n",
|
||
"已生成 40 个分子图片\n",
|
||
"已生成 50 个分子图片\n",
|
||
"已生成 60 个分子图片\n",
|
||
"已生成 70 个分子图片\n",
|
||
"已生成 80 个分子图片\n",
|
||
"已生成 90 个分子图片\n",
|
||
"已生成 100 个分子图片\n",
|
||
"已生成 110 个分子图片\n",
|
||
"已生成 120 个分子图片\n",
|
||
"已生成 130 个分子图片\n",
|
||
"已生成 140 个分子图片\n",
|
||
"已生成 150 个分子图片\n",
|
||
"已生成 160 个分子图片\n",
|
||
"已生成 170 个分子图片\n",
|
||
"已生成 180 个分子图片\n",
|
||
"已生成 190 个分子图片\n",
|
||
"已生成 200 个分子图片\n",
|
||
"已生成 210 个分子图片\n",
|
||
"已生成 220 个分子图片\n",
|
||
"已生成 230 个分子图片\n",
|
||
"已生成 240 个分子图片\n",
|
||
"已生成 250 个分子图片\n",
|
||
"已生成 260 个分子图片\n",
|
||
"完成!共生成 262 个可视化图片\n",
|
||
"\n",
|
||
"示例图片: 118-00-3_Guanosine.svg\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"image/svg+xml": [
|
||
"<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:rdkit=\"http://www.rdkit.org/xml\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" version=\"1.1\" baseProfile=\"full\" xml:space=\"preserve\" width=\"1200px\" height=\"900px\" viewBox=\"0 0 1200 900\">\n",
|
||
"<!-- END OF HEADER -->\n",
|
||
"<rect style=\"opacity:1.0;fill:#FFFFFF;stroke:none\" width=\"1200.0\" height=\"900.0\" x=\"0.0\" y=\"0.0\"> </rect>\n",
|
||
"<path class=\"bond-0 atom-0 atom-1\" d=\"M 912.0,197.7 L 940.1,201.0 L 924.8,332.9 L 896.6,329.6 Z\" style=\"fill:#4C4CFF;fill-rule:evenodd;fill-opacity:1;stroke:#4C4CFF;stroke-width:0.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:10;stroke-opacity:1;\"/>\n",
|
||
"<ellipse cx=\"932.9\" cy=\"201.5\" rx=\"26.6\" ry=\"26.6\" class=\"atom-0\" style=\"fill:#4C4CFF;fill-rule:evenodd;stroke:#4C4CFF;stroke-width:1.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<ellipse cx=\"910.7\" cy=\"331.2\" rx=\"26.6\" ry=\"26.6\" class=\"atom-1\" style=\"fill:#4C4CFF;fill-rule:evenodd;stroke:#4C4CFF;stroke-width:1.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-0 atom-0 atom-1\" d=\"M 925.1,208.0 L 910.7,331.2\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-1 atom-1 atom-2\" d=\"M 910.7,331.2 L 853.5,355.9\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-1 atom-1 atom-2\" d=\"M 853.5,355.9 L 796.4,380.6\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-1 atom-1 atom-2\" d=\"M 908.0,354.1 L 856.2,376.5\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-1 atom-1 atom-2\" d=\"M 856.2,376.5 L 804.3,398.9\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-2 atom-2 atom-3\" d=\"M 787.8,392.5 L 780.6,454.1\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-2 atom-2 atom-3\" d=\"M 780.6,454.1 L 773.4,515.8\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-3 atom-3 atom-4\" d=\"M 773.4,515.8 L 879.9,595.0\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-3 atom-3 atom-4\" d=\"M 794.5,506.6 L 882.6,572.2\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-4 atom-4 atom-5\" d=\"M 879.9,595.0 L 860.1,653.6\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-4 atom-4 atom-5\" d=\"M 860.1,653.6 L 840.4,712.2\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-5 atom-5 atom-6\" d=\"M 829.8,720.7 L 767.3,720.0\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-5 atom-5 atom-6\" d=\"M 767.3,720.0 L 704.7,719.3\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-5 atom-5 atom-6\" d=\"M 830.1,700.8 L 774.7,700.2\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-5 atom-5 atom-6\" d=\"M 774.7,700.2 L 719.4,699.6\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-6 atom-6 atom-7\" d=\"M 704.7,719.3 L 686.2,660.3\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-6 atom-6 atom-7\" d=\"M 686.2,660.3 L 667.8,601.2\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-7 atom-3 atom-7\" d=\"M 773.4,515.8 L 723.0,551.5\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-7 atom-3 atom-7\" d=\"M 723.0,551.5 L 672.7,587.2\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-8 atom-7 atom-8\" d=\"M 657.5,590.0 L 598.4,570.1\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-8 atom-7 atom-8\" d=\"M 598.4,570.1 L 539.3,550.1\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-9 atom-8 atom-9\" d=\"M 539.3,550.1 L 489.3,585.6\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-9 atom-8 atom-9\" d=\"M 489.3,585.6 L 439.2,621.0\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-10 atom-9 atom-10\" d=\"M 422.7,620.8 L 373.6,584.2\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-10 atom-9 atom-10\" d=\"M 373.6,584.2 L 324.4,547.7\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-11 atom-10 atom-11\" d=\"M 324.4,547.7 L 197.7,587.2\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-12 atom-11 atom-12\" d=\"M 197.7,587.2 L 153.0,546.1\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-12 atom-11 atom-12\" d=\"M 153.0,546.1 L 108.3,504.9\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-13 atom-10 atom-13\" d=\"M 324.4,547.7 L 366.9,421.8\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-14 atom-13 atom-14\" d=\"M 366.9,421.8 L 331.6,372.1\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-14 atom-13 atom-14\" d=\"M 331.6,372.1 L 296.3,322.3\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-15 atom-13 atom-15\" d=\"M 366.9,421.8 L 499.7,423.4\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-16 atom-8 atom-15\" d=\"M 539.3,550.1 L 499.7,423.4\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-17 atom-15 atom-16\" d=\"M 499.7,423.4 L 536.0,374.5\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-17 atom-15 atom-16\" d=\"M 536.0,374.5 L 572.4,325.6\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-18 atom-4 atom-17\" d=\"M 879.9,595.0 L 1001.8,542.4\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-19 atom-17 atom-18\" d=\"M 991.3,547.0 L 1042.7,585.2\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-19 atom-17 atom-18\" d=\"M 1042.7,585.2 L 1094.1,623.5\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-19 atom-17 atom-18\" d=\"M 1003.2,531.0 L 1054.6,569.3\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-19 atom-17 atom-18\" d=\"M 1054.6,569.3 L 1106.0,607.5\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-20 atom-17 atom-19\" d=\"M 1001.8,542.4 L 1009.0,480.8\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-20 atom-17 atom-19\" d=\"M 1009.0,480.8 L 1016.2,419.1\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-21 atom-1 atom-19\" d=\"M 910.7,331.2 L 960.2,368.0\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path class=\"bond-21 atom-1 atom-19\" d=\"M 960.2,368.0 L 1009.6,404.9\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
|
||
"<path d=\"M 707.8,719.4 L 704.7,719.3 L 703.7,716.4\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:10;stroke-opacity:1;\"/>\n",
|
||
"<path d=\"M 204.0,585.3 L 197.7,587.2 L 195.5,585.2\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:10;stroke-opacity:1;\"/>\n",
|
||
"<path d=\"M 995.7,545.0 L 1001.8,542.4 L 1002.2,539.3\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:10;stroke-opacity:1;\"/>\n",
|
||
"<path class=\"atom-0\" d=\"M 924.2 195.1 L 927.0 199.6 Q 927.3 200.0, 927.7 200.8 Q 928.1 201.7, 928.2 201.7 L 928.2 195.1 L 929.3 195.1 L 929.3 203.6 L 928.1 203.6 L 925.1 198.7 Q 924.8 198.1, 924.4 197.4 Q 924.1 196.8, 924.0 196.6 L 924.0 203.6 L 922.9 203.6 L 922.9 195.1 L 924.2 195.1 \" fill=\"#000000\"/>\n",
|
||
"<path class=\"atom-0\" d=\"M 930.9 195.1 L 932.1 195.1 L 932.1 198.7 L 936.4 198.7 L 936.4 195.1 L 937.6 195.1 L 937.6 203.6 L 936.4 203.6 L 936.4 199.7 L 932.1 199.7 L 932.1 203.6 L 930.9 203.6 L 930.9 195.1 \" fill=\"#000000\"/>\n",
|
||
"<path class=\"atom-0\" d=\"M 939.2 203.3 Q 939.4 202.8, 939.9 202.5 Q 940.4 202.2, 941.1 202.2 Q 942.0 202.2, 942.4 202.6 Q 942.9 203.1, 942.9 203.9 Q 942.9 204.7, 942.3 205.5 Q 941.7 206.3, 940.4 207.2 L 943.0 207.2 L 943.0 207.8 L 939.2 207.8 L 939.2 207.3 Q 940.3 206.6, 940.9 206.0 Q 941.5 205.5, 941.8 205.0 Q 942.1 204.5, 942.1 203.9 Q 942.1 203.4, 941.8 203.1 Q 941.6 202.8, 941.1 202.8 Q 940.7 202.8, 940.4 203.0 Q 940.0 203.2, 939.8 203.6 L 939.2 203.3 \" fill=\"#000000\"/>\n",
|
||
"<path class=\"atom-2\" d=\"M 786.9 379.6 L 789.7 384.1 Q 790.0 384.6, 790.4 385.4 Q 790.8 386.2, 790.9 386.2 L 790.9 379.6 L 792.0 379.6 L 792.0 388.1 L 790.8 388.1 L 787.8 383.2 Q 787.5 382.6, 787.1 382.0 Q 786.8 381.3, 786.7 381.1 L 786.7 388.1 L 785.6 388.1 L 785.6 379.6 L 786.9 379.6 \" fill=\"#0000FF\"/>\n",
|
||
"<path class=\"atom-5\" d=\"M 835.6 716.6 L 838.4 721.1 Q 838.6 721.5, 839.1 722.3 Q 839.5 723.1, 839.5 723.2 L 839.5 716.6 L 840.7 716.6 L 840.7 725.1 L 839.5 725.1 L 836.5 720.2 Q 836.2 719.6, 835.8 718.9 Q 835.4 718.3, 835.3 718.1 L 835.3 725.1 L 834.2 725.1 L 834.2 716.6 L 835.6 716.6 \" fill=\"#0000FF\"/>\n",
|
||
"<path class=\"atom-7\" d=\"M 663.2 588.3 L 666.0 592.8 Q 666.3 593.3, 666.7 594.1 Q 667.2 594.9, 667.2 594.9 L 667.2 588.3 L 668.3 588.3 L 668.3 596.8 L 667.1 596.8 L 664.2 591.9 Q 663.8 591.3, 663.4 590.7 Q 663.1 590.0, 663.0 589.8 L 663.0 596.8 L 661.9 596.8 L 661.9 588.3 L 663.2 588.3 \" fill=\"#0000FF\"/>\n",
|
||
"<path class=\"atom-9\" d=\"M 427.1 626.9 Q 427.1 624.9, 428.1 623.8 Q 429.1 622.6, 431.0 622.6 Q 432.8 622.6, 433.9 623.8 Q 434.9 624.9, 434.9 626.9 Q 434.9 629.0, 433.8 630.2 Q 432.8 631.3, 431.0 631.3 Q 429.1 631.3, 428.1 630.2 Q 427.1 629.0, 427.1 626.9 M 431.0 630.4 Q 432.3 630.4, 433.0 629.5 Q 433.7 628.6, 433.7 626.9 Q 433.7 625.3, 433.0 624.4 Q 432.3 623.6, 431.0 623.6 Q 429.7 623.6, 429.0 624.4 Q 428.3 625.3, 428.3 626.9 Q 428.3 628.7, 429.0 629.5 Q 429.7 630.4, 431.0 630.4 \" fill=\"#FF0000\"/>\n",
|
||
"<path class=\"atom-12\" d=\"M 87.7 493.1 L 88.9 493.1 L 88.9 496.7 L 93.2 496.7 L 93.2 493.1 L 94.4 493.1 L 94.4 501.6 L 93.2 501.6 L 93.2 497.6 L 88.9 497.6 L 88.9 501.6 L 87.7 501.6 L 87.7 493.1 \" fill=\"#FF0000\"/>\n",
|
||
"<path class=\"atom-12\" d=\"M 96.1 497.3 Q 96.1 495.3, 97.1 494.1 Q 98.1 493.0, 100.0 493.0 Q 101.9 493.0, 102.9 494.1 Q 103.9 495.3, 103.9 497.3 Q 103.9 499.4, 102.9 500.5 Q 101.9 501.7, 100.0 501.7 Q 98.2 501.7, 97.1 500.5 Q 96.1 499.4, 96.1 497.3 M 100.0 500.7 Q 101.3 500.7, 102.0 499.9 Q 102.7 499.0, 102.7 497.3 Q 102.7 495.6, 102.0 494.8 Q 101.3 493.9, 100.0 493.9 Q 98.7 493.9, 98.0 494.8 Q 97.3 495.6, 97.3 497.3 Q 97.3 499.0, 98.0 499.9 Q 98.7 500.7, 100.0 500.7 \" fill=\"#FF0000\"/>\n",
|
||
"<path class=\"atom-14\" d=\"M 277.8 309.3 L 278.9 309.3 L 278.9 312.9 L 283.3 312.9 L 283.3 309.3 L 284.4 309.3 L 284.4 317.8 L 283.3 317.8 L 283.3 313.9 L 278.9 313.9 L 278.9 317.8 L 277.8 317.8 L 277.8 309.3 \" fill=\"#FF0000\"/>\n",
|
||
"<path class=\"atom-14\" d=\"M 286.2 313.6 Q 286.2 311.5, 287.2 310.4 Q 288.2 309.2, 290.1 309.2 Q 292.0 309.2, 293.0 310.4 Q 294.0 311.5, 294.0 313.6 Q 294.0 315.6, 293.0 316.8 Q 291.9 318.0, 290.1 318.0 Q 288.2 318.0, 287.2 316.8 Q 286.2 315.6, 286.2 313.6 M 290.1 317.0 Q 291.4 317.0, 292.1 316.1 Q 292.8 315.3, 292.8 313.6 Q 292.8 311.9, 292.1 311.0 Q 291.4 310.2, 290.1 310.2 Q 288.8 310.2, 288.1 311.0 Q 287.4 311.9, 287.4 313.6 Q 287.4 315.3, 288.1 316.1 Q 288.8 317.0, 290.1 317.0 \" fill=\"#FF0000\"/>\n",
|
||
"<path class=\"atom-16\" d=\"M 575.1 316.9 Q 575.1 314.8, 576.1 313.7 Q 577.1 312.5, 579.0 312.5 Q 580.8 312.5, 581.8 313.7 Q 582.9 314.8, 582.9 316.9 Q 582.9 318.9, 581.8 320.1 Q 580.8 321.3, 579.0 321.3 Q 577.1 321.3, 576.1 320.1 Q 575.1 318.9, 575.1 316.9 M 579.0 320.3 Q 580.2 320.3, 580.9 319.4 Q 581.7 318.6, 581.7 316.9 Q 581.7 315.2, 580.9 314.3 Q 580.2 313.5, 579.0 313.5 Q 577.7 313.5, 576.9 314.3 Q 576.3 315.2, 576.3 316.9 Q 576.3 318.6, 576.9 319.4 Q 577.7 320.3, 579.0 320.3 \" fill=\"#FF0000\"/>\n",
|
||
"<path class=\"atom-16\" d=\"M 584.2 312.6 L 585.3 312.6 L 585.3 316.2 L 589.7 316.2 L 589.7 312.6 L 590.8 312.6 L 590.8 321.1 L 589.7 321.1 L 589.7 317.2 L 585.3 317.2 L 585.3 321.1 L 584.2 321.1 L 584.2 312.6 \" fill=\"#FF0000\"/>\n",
|
||
"<path class=\"atom-18\" d=\"M 1104.5 621.7 Q 1104.5 619.7, 1105.5 618.5 Q 1106.5 617.4, 1108.4 617.4 Q 1110.2 617.4, 1111.3 618.5 Q 1112.3 619.7, 1112.3 621.7 Q 1112.3 623.8, 1111.2 624.9 Q 1110.2 626.1, 1108.4 626.1 Q 1106.5 626.1, 1105.5 624.9 Q 1104.5 623.8, 1104.5 621.7 M 1108.4 625.1 Q 1109.7 625.1, 1110.4 624.3 Q 1111.1 623.4, 1111.1 621.7 Q 1111.1 620.0, 1110.4 619.2 Q 1109.7 618.3, 1108.4 618.3 Q 1107.1 618.3, 1106.4 619.2 Q 1105.7 620.0, 1105.7 621.7 Q 1105.7 623.4, 1106.4 624.3 Q 1107.1 625.1, 1108.4 625.1 \" fill=\"#FF0000\"/>\n",
|
||
"<path class=\"atom-19\" d=\"M 1015.3 406.3 L 1018.1 410.8 Q 1018.4 411.2, 1018.8 412.0 Q 1019.3 412.8, 1019.3 412.9 L 1019.3 406.3 L 1020.4 406.3 L 1020.4 414.8 L 1019.3 414.8 L 1016.3 409.8 Q 1015.9 409.3, 1015.6 408.6 Q 1015.2 407.9, 1015.1 407.7 L 1015.1 414.8 L 1014.0 414.8 L 1014.0 406.3 L 1015.3 406.3 \" fill=\"#0000FF\"/>\n",
|
||
"<path class=\"atom-19\" d=\"M 1022.1 406.3 L 1023.2 406.3 L 1023.2 409.9 L 1027.6 409.9 L 1027.6 406.3 L 1028.7 406.3 L 1028.7 414.8 L 1027.6 414.8 L 1027.6 410.8 L 1023.2 410.8 L 1023.2 414.8 L 1022.1 414.8 L 1022.1 406.3 \" fill=\"#0000FF\"/>\n",
|
||
"</svg>"
|
||
],
|
||
"text/plain": [
|
||
"<IPython.core.display.SVG object>"
|
||
]
|
||
},
|
||
"metadata": {},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"def save_aniline_screening_results(df, output_dir, visualization_dir, max_visualizations=500):\n",
|
||
" \"\"\"保存芳香胺筛选结果\"\"\"\n",
|
||
" \n",
|
||
" # 保存CSV文件\n",
|
||
" csv_path = output_dir / \"aniline_candidates.csv\"\n",
|
||
" \n",
|
||
" # 转换ROMol列为SMILES(因为ROMol对象无法保存到CSV)\n",
|
||
" df_export = df.copy()\n",
|
||
" if 'ROMol' in df_export.columns:\n",
|
||
" df_export['SMILES_from_mol'] = df_export['ROMol'].apply(lambda x: Chem.MolToSmiles(x) if x else '')\n",
|
||
" df_export = df_export.drop('ROMol', axis=1)\n",
|
||
" \n",
|
||
" df_export.to_csv(csv_path, index=False, encoding='utf-8')\n",
|
||
" print(f\"CSV结果已保存到:{csv_path}\")\n",
|
||
" print(f\"包含 {len(df_export)} 个分子,{len(df_export.columns)} 个属性列\")\n",
|
||
" \n",
|
||
" # 生成可视化图片\n",
|
||
" print(f\"\\n开始生成可视化图片(最多{max_visualizations}个)...\")\n",
|
||
" generated_count = 0\n",
|
||
" \n",
|
||
" for idx, row in df.iterrows():\n",
|
||
" if generated_count >= max_visualizations:\n",
|
||
" print(f\"已达到最大可视化数量限制 ({max_visualizations}),停止生成\")\n",
|
||
" break\n",
|
||
" \n",
|
||
" cas = str(row.get('CAS', 'unknown')).strip()\n",
|
||
" name = str(row.get('Name', 'unknown')).strip()\n",
|
||
" \n",
|
||
" # 清理文件名(去除特殊字符)\n",
|
||
" safe_name = \"\".join(c for c in name if c.isalnum() or c in (' ', '-', '_')).rstrip()\n",
|
||
" safe_cas = \"\".join(c for c in cas if c.isalnum() or c in ('-',)).rstrip()\n",
|
||
" \n",
|
||
" # 跳过无效的标识符\n",
|
||
" if not safe_cas or safe_cas == 'nan' or safe_cas == 'unknown':\n",
|
||
" continue\n",
|
||
" \n",
|
||
" mol = row.get('ROMol')\n",
|
||
" if mol is None:\n",
|
||
" continue\n",
|
||
" \n",
|
||
" matched_atoms = row.get('matched_atoms', [])\n",
|
||
" if not matched_atoms:\n",
|
||
" continue\n",
|
||
" \n",
|
||
" # 生成文件名和标题\n",
|
||
" filename = visualization_dir / f\"{safe_cas}_{safe_name.replace(' ', '_')}.svg\"\n",
|
||
" title = f\"{name} ({cas}) - 芳香胺结构\"\n",
|
||
" \n",
|
||
" try:\n",
|
||
" # 生成SVG\n",
|
||
" svg_content = generate_highlighted_svg(mol, matched_atoms, filename, title)\n",
|
||
" generated_count += 1\n",
|
||
" \n",
|
||
" # 每10个显示一次进度\n",
|
||
" if generated_count % 10 == 0:\n",
|
||
" print(f\"已生成 {generated_count} 个分子图片\")\n",
|
||
" \n",
|
||
" except Exception as e:\n",
|
||
" print(f\"生成 {safe_cas} 失败: {e}\")\n",
|
||
" continue\n",
|
||
" \n",
|
||
" print(f\"完成!共生成 {generated_count} 个可视化图片\")\n",
|
||
" return csv_path, generated_count\n",
|
||
"\n",
|
||
"# 保存结果\n",
|
||
"if len(matched_df) > 0:\n",
|
||
" csv_path, viz_count = save_aniline_screening_results(\n",
|
||
" matched_df, output_dir, visualization_dir, max_visualizations=500\n",
|
||
" )\n",
|
||
" \n",
|
||
" # 显示第一个生成的图片作为示例\n",
|
||
" if viz_count > 0:\n",
|
||
" example_files = list(visualization_dir.glob(\"*.svg\"))\n",
|
||
" if example_files:\n",
|
||
" example_file = example_files[0]\n",
|
||
" print(f\"\\n示例图片: {example_file.name}\")\n",
|
||
" with open(example_file, \"r\") as f:\n",
|
||
" svg_content = f.read()\n",
|
||
" display(SVG(svg_content))\n",
|
||
"else:\n",
|
||
" print(\"没有匹配结果,无需保存\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 结果统计和分析\n",
|
||
"\n",
|
||
"### 筛选统计\n",
|
||
"- 总分子数\n",
|
||
"- 匹配分子数\n",
|
||
"- 可视化文件数量\n",
|
||
"- 输出文件位置"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 10,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"=== 芳香胺筛选结果统计 ===\n",
|
||
"总分子数:3276\n",
|
||
"匹配分子数:262\n",
|
||
"匹配率:8.00%\n",
|
||
"\n",
|
||
"输出目录:../data/drug_targetmol/aniline_candidates\n",
|
||
"CSV文件:../data/drug_targetmol/aniline_candidates/aniline_candidates.csv\n",
|
||
"可视化目录:../data/drug_targetmol/aniline_candidates/visualizations\n",
|
||
"SVG文件数量:262\n",
|
||
"\n",
|
||
"匹配数量最多的分子:\n",
|
||
" Name CAS total_matches\n",
|
||
"432 Proflavine Hemisulfate 1811-28-5 4\n",
|
||
"1064 Triamterene 396-01-0 3\n",
|
||
"335 Pemetrexed disodium hemipenta hydrate 357166-30-4 2\n",
|
||
"463 Lamotrigine 84057-84-1 2\n",
|
||
"779 Pyrimethamine 58-14-0 2\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# 结果统计\n",
|
||
"print(\"=== 芳香胺筛选结果统计 ===\")\n",
|
||
"print(f\"总分子数:{len(df)}\")\n",
|
||
"print(f\"匹配分子数:{len(matched_df)}\")\n",
|
||
"print(f\"匹配率:{len(matched_df)/len(df)*100:.2f}%\")\n",
|
||
"print(f\"\\n输出目录:{output_dir}\")\n",
|
||
"print(f\"CSV文件:{output_dir}/aniline_candidates.csv\")\n",
|
||
"print(f\"可视化目录:{visualization_dir}\")\n",
|
||
"print(f\"SVG文件数量:{len(list(visualization_dir.glob('*.svg')))}\")\n",
|
||
"\n",
|
||
"# 显示匹配最多的前几个分子\n",
|
||
"if len(matched_df) > 0:\n",
|
||
" print(\"\\n匹配数量最多的分子:\")\n",
|
||
" top_matches = matched_df.nlargest(5, 'total_matches')[['Name', 'CAS', 'total_matches']]\n",
|
||
" print(top_matches)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 使用建议\n",
|
||
"\n",
|
||
"### 筛选结果解读\n",
|
||
"- **匹配分子**:包含芳香胺结构(Ar-NH₂)的药物\n",
|
||
"- **蓝色高亮**:匹配的SMARTS结构(芳香碳/氮 + 氨基)\n",
|
||
"- **多重匹配**:分子中可能存在多个芳香胺基团\n",
|
||
"\n",
|
||
"### 后续分析建议\n",
|
||
"1. **合成路线验证**:查阅匹配分子的合成文献\n",
|
||
"2. **Sandmeyer反应确认**:确认是否使用Sandmeyer反应引入卤素\n",
|
||
"3. **张夏恒反应评估**:评估替代Sandmeyer反应的可行性\n",
|
||
"4. **工艺优化潜力**:分析替换为张夏恒反应的经济效益\n",
|
||
"\n",
|
||
"### 文件说明\n",
|
||
"- **CSV文件**:完整的分子属性和匹配信息\n",
|
||
"- **SVG文件**:结构可视化,蓝色高亮芳香胺结构\n",
|
||
"- **命名规则**:{CAS}_{Name}.svg(特殊字符已清理)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 抗生素筛选结果\n",
|
||
"\n",
|
||
"/home/zly/project/macro_split/data/drug_targetmol/aniline_candidates/antibiotics_identified.csv"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": []
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.14.0"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 4
|
||
}
|