Files
macro_split/notebooks/screen_aniline_candidates_executed.ipynb
2025-11-14 20:34:58 +08:00

775 lines
46 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 筛选芳香胺候选药物 - Sandmeyer反应起始物分析\n",
"\n",
"## 背景介绍\n",
"\n",
"### Sandmeyer反应回顾\n",
"Sandmeyer反应是经典的芳香胺转化方法\n",
"**Ar-NH₂ → [Ar-N₂]⁺ → Ar-X**\n",
"其中 X = Cl, Br, I, CN, OH, SCN 等\n",
"\n",
"### 筛选目标\n",
"通过识别药物分子中含有芳香胺结构Ar-NH₂的化合物\n",
"找出可能作为Sandmeyer反应起始物的候选药物。\n",
"这些分子可能原本通过Sandmeyer反应引入芳香卤素\n",
"现在可以用张夏恒反应进行更高效的转化。\n",
"\n",
"### SMARTS模式\n",
"使用SMARTS模式 `[c,n][NH2]` 匹配:\n",
"- `[c,n]`: 芳香碳或氮原子\n",
"- `[NH2]`: 氨基(-NH₂\n",
"\n",
"**重要提醒:**\n",
"- 此筛选基于分子结构特征\n",
"- 最终需要查阅文献确认合成路线\n",
"- 并非所有含芳香胺的药物都使用Sandmeyer反应"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 导入所需库"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"execution": {
"iopub.execute_input": "2025-11-11T13:21:31.660096Z",
"iopub.status.busy": "2025-11-11T13:21:31.657369Z",
"iopub.status.idle": "2025-11-11T13:21:32.943162Z",
"shell.execute_reply": "2025-11-11T13:21:32.938881Z"
}
},
"outputs": [],
"source": [
"import os\n",
"from pathlib import Path\n",
"from rdkit import Chem\n",
"from rdkit.Chem import PandasTools, Draw\n",
"from rdkit.Chem.Draw import rdMolDraw2D\n",
"from IPython.display import SVG, display\n",
"from rdkit.Chem import AllChem\n",
"import pandas as pd\n",
"import warnings\n",
"warnings.filterwarnings('ignore')\n",
"\n",
"# 设置显示选项\n",
"pd.set_option('display.max_columns', None)\n",
"pd.set_option('display.max_colwidth', 100)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 定义筛选模式和可视化函数\n",
"\n",
"### SMARTS模式设置\n",
"- **目标模式**: `[c,n][NH2]` - 芳香碳/氮原子连接的氨基\n",
"- **匹配逻辑**: 寻找所有包含此子结构的分子"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"execution": {
"iopub.execute_input": "2025-11-11T13:21:32.959832Z",
"iopub.status.busy": "2025-11-11T13:21:32.957734Z",
"iopub.status.idle": "2025-11-11T13:21:32.987085Z",
"shell.execute_reply": "2025-11-11T13:21:32.980584Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"使用SMARTS模式: [c,n][NH2]\n",
"模式验证: ✓\n",
"\n",
"创建目录:../data/drug_targetmol/aniline_candidates\n",
"创建可视化目录:../data/drug_targetmol/aniline_candidates/visualizations\n"
]
}
],
"source": [
"# 定义筛选模式\n",
"TARGET_SMARTS = '[c,n][NH2]'\n",
"pattern = Chem.MolFromSmarts(TARGET_SMARTS)\n",
"\n",
"if pattern is None:\n",
" raise ValueError(f\"无效的SMARTS模式: {TARGET_SMARTS}\")\n",
"\n",
"print(f\"使用SMARTS模式: {TARGET_SMARTS}\")\n",
"print(f\"模式验证: {'✓' if pattern else '✗'}\")\n",
"\n",
"# 创建输出目录\n",
"output_base = Path(\"../data/drug_targetmol\")\n",
"output_dir = output_base / \"aniline_candidates\"\n",
"visualization_dir = output_dir / \"visualizations\"\n",
"\n",
"output_dir.mkdir(exist_ok=True)\n",
"visualization_dir.mkdir(exist_ok=True)\n",
"\n",
"print(f\"\\n创建目录{output_dir}\")\n",
"print(f\"创建可视化目录:{visualization_dir}\")\n",
"\n",
"def generate_highlighted_svg(mol, highlight_atoms, filename, title=\"\"):\n",
" \"\"\"生成高亮匹配结构的高清晰度SVG图片\"\"\"\n",
" # 计算2D坐标\n",
" AllChem.Compute2DCoords(mol)\n",
" \n",
" # 创建SVG绘制器\n",
" drawer = rdMolDraw2D.MolDraw2DSVG(1200, 900) # 更大的尺寸以提高清晰度\n",
" drawer.SetFontSize(12)\n",
" \n",
" # 绘制选项\n",
" draw_options = drawer.drawOptions()\n",
" draw_options.addAtomIndices = False # 不显示原子索引,保持简洁\n",
" draw_options.addBondIndices = False\n",
" draw_options.addStereoAnnotation = True\n",
" draw_options.fixedFontSize = 12\n",
" \n",
" # 高亮匹配的原子(蓝色)\n",
" atom_colors = {}\n",
" for atom_idx in highlight_atoms:\n",
" atom_colors[atom_idx] = (0.3, 0.3, 1.0) # 蓝色高亮\n",
" \n",
" # 绘制分子\n",
" drawer.DrawMolecule(mol, \n",
" highlightAtoms=highlight_atoms,\n",
" highlightAtomColors=atom_colors)\n",
" \n",
" drawer.FinishDrawing()\n",
" svg_content = drawer.GetDrawingText()\n",
" \n",
" # 添加标题\n",
" if title:\n",
" # 在SVG中添加标题\n",
" svg_lines = svg_content.split(\"\\\\n\")\n",
" # 在<g>标签前插入标题\n",
" for i, line in enumerate(svg_lines):\n",
" if \"<g \" in line and \"transform\" in line:\n",
" svg_lines.insert(i, f\"<text x=\\\"50%\\\" y=\\\"30\\\" text-anchor=\\\"middle\\\" font-size=\\\"16\\\" font-weight=\\\"bold\\\">{title}</text>\")\n",
" break\n",
" svg_with_title = \"\\\\n\".join(svg_lines)\n",
" else:\n",
" svg_with_title = svg_content\n",
" \n",
" # 保存文件\n",
" with open(filename, \"w\") as f:\n",
" f.write(svg_with_title)\n",
" \n",
" return svg_content"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 数据加载和分子筛选\n",
"\n",
"### 数据源\n",
"- 文件位置:`data/drug_targetmol/0c04ffc9fe8c2ec916412fbdc2a49bf4.sdf`\n",
"- 包含药物分子结构和丰富属性信息\n",
"\n",
"### 筛选逻辑\n",
"1. 读取SDF文件\n",
"2. 对每个分子进行SMARTS匹配\n",
"3. 记录匹配的原子和匹配数量\n",
"4. 保存匹配结果到CSV\n",
"5. 生成高亮可视化图片"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"execution": {
"iopub.execute_input": "2025-11-11T13:21:33.114695Z",
"iopub.status.busy": "2025-11-11T13:21:33.113063Z",
"iopub.status.idle": "2025-11-11T13:21:35.754026Z",
"shell.execute_reply": "2025-11-11T13:21:35.745369Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"正在读取SDF文件...\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[21:21:34] Both bonds on one end of an atropisomer are on the same side - atoms is : 3\n",
"[21:21:34] Explicit valence for atom # 2 N greater than permitted\n",
"[21:21:34] ERROR: Could not sanitize molecule ending on line 217340\n",
"[21:21:34] ERROR: Explicit valence for atom # 2 N greater than permitted\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[21:21:35] Explicit valence for atom # 4 N greater than permitted\n",
"[21:21:35] ERROR: Could not sanitize molecule ending on line 317283\n",
"[21:21:35] ERROR: Explicit valence for atom # 4 N greater than permitted\n",
"[21:21:35] Explicit valence for atom # 4 N greater than permitted\n",
"[21:21:35] ERROR: Could not sanitize molecule ending on line 324666\n",
"[21:21:35] ERROR: Explicit valence for atom # 4 N greater than permitted\n",
"[21:21:35] Explicit valence for atom # 5 N greater than permitted\n",
"[21:21:35] ERROR: Could not sanitize molecule ending on line 365883\n",
"[21:21:35] ERROR: Explicit valence for atom # 5 N greater than permitted\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"成功加载 3276 个分子\n",
"\n",
"数据概览:\n",
" Index Plate Row Col ID Name \\\n",
"0 1 L1010-1 a 2 Dexamethasone \n",
"1 2 L1010-1 a 3 Danicopan \n",
"2 3 L1010-1 a 4 Cyclosporin A \n",
"3 4 L1010-1 a 5 L-Carnitine \n",
"4 5 L1010-1 a 6 Trimetazidine dihydrochloride \n",
"\n",
" Synonyms CAS \\\n",
"0 MK 125;Prednisolone F;NSC 34521;Hexadecadrol 50-02-2 \n",
"1 ACH-4471 1903768-17-1 \n",
"2 Cyclosporine A;Ciclosporin;Cyclosporine 59865-13-3 \n",
"3 L(-)-Carnitine;Levocarnitine 541-15-1 \n",
"4 Yoshimilon;Kyurinett;Vastarel F 13171-25-0 \n",
"\n",
" SMILES \\\n",
"0 C[C@@H]1C[C@H]2[C@@H]3CCC4=CC(=O)C=C[C@]4(C)[C@@]3(F)[C@@H](O)C[C@]2(C)[C@@]1(O)C(=O)CO \n",
"1 CC(=O)c1nn(CC(=O)N2C[C@H](F)C[C@H]2C(=O)Nc2cccc(Br)n2)c2ccc(cc12)-c1cnc(C)nc1 \n",
"2 [C@H]([C@@H](C/C=C/C)C)(O)[C@@]1(N(C)C(=O)[C@H]([C@@H](C)C)N(C)C(=O)[C@H](CC(C)C)N(C)C(=O)[C@H](... \n",
"3 C[N+](C)(C)C[C@@H](O)CC([O-])=O \n",
"4 Cl.Cl.COC1=C(OC)C(OC)=C(CN2CCNCC2)C=C1 \n",
"\n",
" Formula MolWt Approved status \\\n",
"0 C22H29FO5 392.46 NMPA;EMA;FDA \n",
"1 C26H23BrFN7O3 580.41 FDA \n",
"2 C62H111N11O12 1202.61 FDA \n",
"3 C7H15NO3 161.2 FDA \n",
"4 C14H24Cl2N2O3 339.258 NMPA;EMA \n",
"\n",
" Pharmacopoeia \\\n",
"0 USP39-NF34;BP2015;JP16;IP2010 \n",
"1 NaN \n",
"2 Martindale the Extra Pharmacopoei, EP10.2, USP43-NF38, Ph.Int_6th, JP17 \n",
"3 NaN \n",
"4 BP2019;KP ;EP9.2;IP2010;JP17;Martindale:The Extra Pharmacopoeia \n",
"\n",
" Disease \\\n",
"0 Metabolism \n",
"1 Others \n",
"2 Immune system \n",
"3 Cardiovascular system \n",
"4 Cardiovascular system \n",
"\n",
" Pathways \\\n",
"0 Antibody-drug Conjugate/ADC Related;Autophagy;Endocrinology/Hormones;Immunology/Inflammation;Mic... \n",
"1 Immunology/Inflammation \n",
"2 Immunology/Inflammation;Metabolism;Microbiology/Virology \n",
"3 Metabolism \n",
"4 Autophagy;Metabolism \n",
"\n",
" Target \\\n",
"0 Antibacterial;Antibiotic;Autophagy;Complement System;Glucocorticoid Receptor;IL Receptor;Mitopha... \n",
"1 Complement System \n",
"2 Phosphatase;Antibiotic;Complement System \n",
"3 Endogenous Metabolite;Fatty Acid Synthase \n",
"4 Autophagy;Fatty Acid Synthase \n",
"\n",
" Receptor \\\n",
"0 Antibiotic; Autophagy; Bacterial; Complement System; Glucocorticoid Receptor; IL receptor; Mitop... \n",
"1 Complement System; factor D \n",
"2 Antibiotic; calcineurin phosphatase; Complement System; Phosphatase \n",
"3 Endogenous Metabolite; FAS \n",
"4 Autophagy; mitochondrial long-chain 3-ketoacyl thiolase \n",
"\n",
" Bioactivity \\\n",
"0 Dexamethasone is a glucocorticoid receptor agonist and IL receptor modulator with anti-inflammat... \n",
"1 Danicopan (ACH-4471) (ACH-4471) is a selective, orally active small molecule factor D inhibitor ... \n",
"2 Cyclosporin A is a natural product and an active fungal metabolite, classified as a cyclic polyp... \n",
"3 L-Carnitine (L(-)-Carnitine) is an amino acid derivative. L-Carnitine facilitates long-chain fat... \n",
"4 Trimetazidine dihydrochloride (Vastarel F) can improve myocardial glucose utilization by inhibit... \n",
"\n",
" Reference \\\n",
"0 Li M, Yu H. Identification of WP1066, an inhibitor of JAK2 and STAT3, as a Kv1. 3 potassium chan... \n",
"1 Yuan X, et al. Small-molecule factor D inhibitors selectively block the alternative pathway of c... \n",
"2 D'Angelo G, et al. Cyclosporin A prevents the hypoxic adaptation by activating hypoxia-inducible... \n",
"3 Jogl G, Tong L. Cell. 2003 Jan 10; 112(1):113-22. \n",
"4 Yang Q, et al. Int J Clin Exp Pathol. 2015, 8(4):3735-3741.;Liu Z, et al. Metabolism. 2016, 65(3... \n",
"\n",
" ROMol \n",
"0 <rdkit.Chem.rdchem.Mol object at 0x774684c557e0> \n",
"1 <rdkit.Chem.rdchem.Mol object at 0x7746818ffdf0> \n",
"2 <rdkit.Chem.rdchem.Mol object at 0x7746818ffd80> \n",
"3 <rdkit.Chem.rdchem.Mol object at 0x7746816e0040> \n",
"4 <rdkit.Chem.rdchem.Mol object at 0x7746816e00b0> \n",
"\n",
"列名:['Index', 'Plate', 'Row', 'Col', 'ID', 'Name', 'Synonyms', 'CAS', 'SMILES', 'Formula', 'MolWt', 'Approved status', 'Pharmacopoeia', 'Disease', 'Pathways', 'Target', 'Receptor', 'Bioactivity', 'Reference', 'ROMol']\n"
]
}
],
"source": [
"# 读取SDF文件\n",
"sdf_path = '../data/drug_targetmol/0c04ffc9fe8c2ec916412fbdc2a49bf4.sdf'\n",
"\n",
"print(\"正在读取SDF文件...\")\n",
"df = PandasTools.LoadSDF(sdf_path)\n",
"print(f\"成功加载 {len(df)} 个分子\")\n",
"\n",
"# 显示数据基本信息\n",
"print(\"\\n数据概览\")\n",
"print(df.head())\n",
"print(f\"\\n列名{list(df.columns)}\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"execution": {
"iopub.execute_input": "2025-11-11T13:21:35.770585Z",
"iopub.status.busy": "2025-11-11T13:21:35.768752Z",
"iopub.status.idle": "2025-11-11T13:21:36.114723Z",
"shell.execute_reply": "2025-11-11T13:21:36.111467Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"开始筛选芳香胺结构...\n",
"SMARTS模式: [c,n][N&H2]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"找到 78 个匹配分子(处理了 1000 个分子)\n",
"\n",
"筛选结果摘要:\n",
" Name CAS Formula total_matches\n",
"17 Guanosine 118-00-3 C10H13N5O5 1\n",
"20 Ganciclovir 82410-32-0 C9H13N5O4 1\n",
"22 Imiquimod maleate 896106-16-4 C18H20N4O4 1\n",
"27 Brincidofovir 444805-28-1 C27H52N3O7P 1\n",
"28 Imiquimod 99011-02-6 C14H16N4 1\n",
"32 Ganciclovir sodium 107910-75-8 C9H13N5NaO4 1\n",
"33 Cytarabine 147-94-4 C9H13N3O5 1\n",
"35 Vidarabine 5536-17-4 C10H13N5O4 1\n",
"38 Penciclovir 39809-25-1 C10H15N5O3 1\n",
"41 Famciclovir 104227-87-4 C14H19N5O4 1\n",
"... 还有 68 个分子\n"
]
}
],
"source": [
"def screen_molecules_for_aniline(df, smarts_pattern, max_molecules=100):\n",
" \"\"\"\n",
" 筛选包含芳香胺结构的分子\n",
" \n",
" Args:\n",
" df: 包含分子的DataFrame\n",
" smarts_pattern: RDKit SMARTS模式对象\n",
" max_molecules: 最大处理分子数量\n",
" \n",
" Returns:\n",
" 筛选结果DataFrame\n",
" \"\"\"\n",
" print(f\"开始筛选芳香胺结构...\")\n",
" print(f\"SMARTS模式: {Chem.MolToSmarts(smarts_pattern)}\")\n",
" \n",
" matched_molecules = []\n",
" processed_count = 0\n",
" \n",
" for idx, row in df.iterrows():\n",
" if processed_count >= max_molecules:\n",
" break\n",
" \n",
" mol = row['ROMol']\n",
" if mol is None:\n",
" continue\n",
" \n",
" processed_count += 1\n",
" \n",
" # 检查是否匹配SMARTS模式\n",
" if mol.HasSubstructMatch(smarts_pattern):\n",
" matches = mol.GetSubstructMatches(smarts_pattern)\n",
" \n",
" # 收集所有匹配的原子\n",
" matched_atoms = set()\n",
" for match in matches:\n",
" matched_atoms.update(match)\n",
" \n",
" # 创建匹配记录\n",
" match_record = row.copy()\n",
" match_record['matched_atoms'] = list(matched_atoms)\n",
" match_record['total_matches'] = len(matches)\n",
" match_record['smarts_pattern'] = Chem.MolToSmarts(smarts_pattern)\n",
" matched_molecules.append(match_record)\n",
" \n",
" result_df = pd.DataFrame(matched_molecules)\n",
" print(f\"找到 {len(result_df)} 个匹配分子(处理了 {processed_count} 个分子)\")\n",
" \n",
" return result_df\n",
"\n",
"# 执行筛选\n",
"matched_df = screen_molecules_for_aniline(df, pattern, max_molecules=1000)\n",
"\n",
"# 显示结果摘要\n",
"if len(matched_df) > 0:\n",
" print(\"\\n筛选结果摘要\")\n",
" summary_cols = ['Name', 'CAS', 'Formula', 'total_matches']\n",
" if len(matched_df) <= 10:\n",
" print(matched_df[summary_cols])\n",
" else:\n",
" print(matched_df[summary_cols].head(10))\n",
" print(f\"... 还有 {len(matched_df) - 10} 个分子\")\n",
"else:\n",
" print(\"\\n未找到匹配分子\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 保存筛选结果\n",
"\n",
"### 输出文件\n",
"1. **CSV文件**:包含所有匹配分子的属性信息和匹配详情\n",
"2. **SVG图片**:每个匹配分子的结构可视化,高亮芳香胺结构"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"execution": {
"iopub.execute_input": "2025-11-11T13:21:36.120981Z",
"iopub.status.busy": "2025-11-11T13:21:36.120553Z",
"iopub.status.idle": "2025-11-11T13:21:36.279125Z",
"shell.execute_reply": "2025-11-11T13:21:36.277892Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CSV结果已保存到../data/drug_targetmol/aniline_candidates/aniline_candidates.csv\n",
"包含 78 个分子23 个属性列\n",
"\n",
"开始生成可视化图片最多50个...\n",
"已生成 10 个分子图片\n",
"已生成 20 个分子图片\n",
"已生成 30 个分子图片\n",
"已生成 40 个分子图片\n",
"已生成 50 个分子图片\n",
"已达到最大可视化数量限制 (50),停止生成\n",
"完成!共生成 50 个可视化图片\n",
"\n",
"示例图片: 118-00-3_Guanosine.svg\n"
]
},
{
"data": {
"image/svg+xml": [
"<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:rdkit=\"http://www.rdkit.org/xml\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" version=\"1.1\" baseProfile=\"full\" xml:space=\"preserve\" width=\"1200px\" height=\"900px\" viewBox=\"0 0 1200 900\">\n",
"<!-- END OF HEADER -->\n",
"<rect style=\"opacity:1.0;fill:#FFFFFF;stroke:none\" width=\"1200.0\" height=\"900.0\" x=\"0.0\" y=\"0.0\"> </rect>\n",
"<path class=\"bond-0 atom-0 atom-1\" d=\"M 912.0,197.7 L 940.1,201.0 L 924.8,332.9 L 896.6,329.6 Z\" style=\"fill:#4C4CFF;fill-rule:evenodd;fill-opacity:1;stroke:#4C4CFF;stroke-width:0.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:10;stroke-opacity:1;\"/>\n",
"<ellipse cx=\"932.9\" cy=\"201.5\" rx=\"26.6\" ry=\"26.6\" class=\"atom-0\" style=\"fill:#4C4CFF;fill-rule:evenodd;stroke:#4C4CFF;stroke-width:1.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<ellipse cx=\"910.7\" cy=\"331.2\" rx=\"26.6\" ry=\"26.6\" class=\"atom-1\" style=\"fill:#4C4CFF;fill-rule:evenodd;stroke:#4C4CFF;stroke-width:1.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-0 atom-0 atom-1\" d=\"M 925.1,208.0 L 910.7,331.2\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-1 atom-1 atom-2\" d=\"M 910.7,331.2 L 853.5,355.9\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-1 atom-1 atom-2\" d=\"M 853.5,355.9 L 796.4,380.6\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-1 atom-1 atom-2\" d=\"M 908.0,354.1 L 856.2,376.5\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-1 atom-1 atom-2\" d=\"M 856.2,376.5 L 804.3,398.9\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-2 atom-2 atom-3\" d=\"M 787.8,392.5 L 780.6,454.1\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-2 atom-2 atom-3\" d=\"M 780.6,454.1 L 773.4,515.8\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-3 atom-3 atom-4\" d=\"M 773.4,515.8 L 879.9,595.0\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-3 atom-3 atom-4\" d=\"M 794.5,506.6 L 882.6,572.2\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-4 atom-4 atom-5\" d=\"M 879.9,595.0 L 860.1,653.6\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-4 atom-4 atom-5\" d=\"M 860.1,653.6 L 840.4,712.2\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-5 atom-5 atom-6\" d=\"M 829.8,720.7 L 767.3,720.0\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-5 atom-5 atom-6\" d=\"M 767.3,720.0 L 704.7,719.3\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-5 atom-5 atom-6\" d=\"M 830.1,700.8 L 774.7,700.2\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-5 atom-5 atom-6\" d=\"M 774.7,700.2 L 719.4,699.6\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-6 atom-6 atom-7\" d=\"M 704.7,719.3 L 686.2,660.3\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-6 atom-6 atom-7\" d=\"M 686.2,660.3 L 667.8,601.2\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-7 atom-3 atom-7\" d=\"M 773.4,515.8 L 723.0,551.5\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-7 atom-3 atom-7\" d=\"M 723.0,551.5 L 672.7,587.2\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-8 atom-7 atom-8\" d=\"M 657.5,590.0 L 598.4,570.1\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-8 atom-7 atom-8\" d=\"M 598.4,570.1 L 539.3,550.1\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-9 atom-8 atom-9\" d=\"M 539.3,550.1 L 489.3,585.6\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-9 atom-8 atom-9\" d=\"M 489.3,585.6 L 439.2,621.0\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-10 atom-9 atom-10\" d=\"M 422.7,620.8 L 373.6,584.2\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-10 atom-9 atom-10\" d=\"M 373.6,584.2 L 324.4,547.7\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-11 atom-10 atom-11\" d=\"M 324.4,547.7 L 197.7,587.2\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-12 atom-11 atom-12\" d=\"M 197.7,587.2 L 153.0,546.1\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-12 atom-11 atom-12\" d=\"M 153.0,546.1 L 108.3,504.9\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-13 atom-10 atom-13\" d=\"M 324.4,547.7 L 366.9,421.8\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-14 atom-13 atom-14\" d=\"M 366.9,421.8 L 331.6,372.1\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-14 atom-13 atom-14\" d=\"M 331.6,372.1 L 296.3,322.3\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-15 atom-13 atom-15\" d=\"M 366.9,421.8 L 499.7,423.4\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-16 atom-8 atom-15\" d=\"M 539.3,550.1 L 499.7,423.4\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-17 atom-15 atom-16\" d=\"M 499.7,423.4 L 536.0,374.5\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-17 atom-15 atom-16\" d=\"M 536.0,374.5 L 572.4,325.6\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-18 atom-4 atom-17\" d=\"M 879.9,595.0 L 1001.8,542.4\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-19 atom-17 atom-18\" d=\"M 991.3,547.0 L 1042.7,585.2\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-19 atom-17 atom-18\" d=\"M 1042.7,585.2 L 1094.1,623.5\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-19 atom-17 atom-18\" d=\"M 1003.2,531.0 L 1054.6,569.3\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-19 atom-17 atom-18\" d=\"M 1054.6,569.3 L 1106.0,607.5\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-20 atom-17 atom-19\" d=\"M 1001.8,542.4 L 1009.0,480.8\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-20 atom-17 atom-19\" d=\"M 1009.0,480.8 L 1016.2,419.1\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-21 atom-1 atom-19\" d=\"M 910.7,331.2 L 960.2,368.0\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path class=\"bond-21 atom-1 atom-19\" d=\"M 960.2,368.0 L 1009.6,404.9\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
"<path d=\"M 707.8,719.4 L 704.7,719.3 L 703.7,716.4\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:10;stroke-opacity:1;\"/>\n",
"<path d=\"M 204.0,585.3 L 197.7,587.2 L 195.5,585.2\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:10;stroke-opacity:1;\"/>\n",
"<path d=\"M 995.7,545.0 L 1001.8,542.4 L 1002.2,539.3\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:10;stroke-opacity:1;\"/>\n",
"<path class=\"atom-0\" d=\"M 924.2 195.1 L 927.0 199.6 Q 927.3 200.0, 927.7 200.8 Q 928.1 201.7, 928.2 201.7 L 928.2 195.1 L 929.3 195.1 L 929.3 203.6 L 928.1 203.6 L 925.1 198.7 Q 924.8 198.1, 924.4 197.4 Q 924.1 196.8, 924.0 196.6 L 924.0 203.6 L 922.9 203.6 L 922.9 195.1 L 924.2 195.1 \" fill=\"#000000\"/>\n",
"<path class=\"atom-0\" d=\"M 930.9 195.1 L 932.1 195.1 L 932.1 198.7 L 936.4 198.7 L 936.4 195.1 L 937.6 195.1 L 937.6 203.6 L 936.4 203.6 L 936.4 199.7 L 932.1 199.7 L 932.1 203.6 L 930.9 203.6 L 930.9 195.1 \" fill=\"#000000\"/>\n",
"<path class=\"atom-0\" d=\"M 939.2 203.3 Q 939.4 202.8, 939.9 202.5 Q 940.4 202.2, 941.1 202.2 Q 942.0 202.2, 942.4 202.6 Q 942.9 203.1, 942.9 203.9 Q 942.9 204.7, 942.3 205.5 Q 941.7 206.3, 940.4 207.2 L 943.0 207.2 L 943.0 207.8 L 939.2 207.8 L 939.2 207.3 Q 940.3 206.6, 940.9 206.0 Q 941.5 205.5, 941.8 205.0 Q 942.1 204.5, 942.1 203.9 Q 942.1 203.4, 941.8 203.1 Q 941.6 202.8, 941.1 202.8 Q 940.7 202.8, 940.4 203.0 Q 940.0 203.2, 939.8 203.6 L 939.2 203.3 \" fill=\"#000000\"/>\n",
"<path class=\"atom-2\" d=\"M 786.9 379.6 L 789.7 384.1 Q 790.0 384.6, 790.4 385.4 Q 790.8 386.2, 790.9 386.2 L 790.9 379.6 L 792.0 379.6 L 792.0 388.1 L 790.8 388.1 L 787.8 383.2 Q 787.5 382.6, 787.1 382.0 Q 786.8 381.3, 786.7 381.1 L 786.7 388.1 L 785.6 388.1 L 785.6 379.6 L 786.9 379.6 \" fill=\"#0000FF\"/>\n",
"<path class=\"atom-5\" d=\"M 835.6 716.6 L 838.4 721.1 Q 838.6 721.5, 839.1 722.3 Q 839.5 723.1, 839.5 723.2 L 839.5 716.6 L 840.7 716.6 L 840.7 725.1 L 839.5 725.1 L 836.5 720.2 Q 836.2 719.6, 835.8 718.9 Q 835.4 718.3, 835.3 718.1 L 835.3 725.1 L 834.2 725.1 L 834.2 716.6 L 835.6 716.6 \" fill=\"#0000FF\"/>\n",
"<path class=\"atom-7\" d=\"M 663.2 588.3 L 666.0 592.8 Q 666.3 593.3, 666.7 594.1 Q 667.2 594.9, 667.2 594.9 L 667.2 588.3 L 668.3 588.3 L 668.3 596.8 L 667.1 596.8 L 664.2 591.9 Q 663.8 591.3, 663.4 590.7 Q 663.1 590.0, 663.0 589.8 L 663.0 596.8 L 661.9 596.8 L 661.9 588.3 L 663.2 588.3 \" fill=\"#0000FF\"/>\n",
"<path class=\"atom-9\" d=\"M 427.1 626.9 Q 427.1 624.9, 428.1 623.8 Q 429.1 622.6, 431.0 622.6 Q 432.8 622.6, 433.9 623.8 Q 434.9 624.9, 434.9 626.9 Q 434.9 629.0, 433.8 630.2 Q 432.8 631.3, 431.0 631.3 Q 429.1 631.3, 428.1 630.2 Q 427.1 629.0, 427.1 626.9 M 431.0 630.4 Q 432.3 630.4, 433.0 629.5 Q 433.7 628.6, 433.7 626.9 Q 433.7 625.3, 433.0 624.4 Q 432.3 623.6, 431.0 623.6 Q 429.7 623.6, 429.0 624.4 Q 428.3 625.3, 428.3 626.9 Q 428.3 628.7, 429.0 629.5 Q 429.7 630.4, 431.0 630.4 \" fill=\"#FF0000\"/>\n",
"<path class=\"atom-12\" d=\"M 87.7 493.1 L 88.9 493.1 L 88.9 496.7 L 93.2 496.7 L 93.2 493.1 L 94.4 493.1 L 94.4 501.6 L 93.2 501.6 L 93.2 497.6 L 88.9 497.6 L 88.9 501.6 L 87.7 501.6 L 87.7 493.1 \" fill=\"#FF0000\"/>\n",
"<path class=\"atom-12\" d=\"M 96.1 497.3 Q 96.1 495.3, 97.1 494.1 Q 98.1 493.0, 100.0 493.0 Q 101.9 493.0, 102.9 494.1 Q 103.9 495.3, 103.9 497.3 Q 103.9 499.4, 102.9 500.5 Q 101.9 501.7, 100.0 501.7 Q 98.2 501.7, 97.1 500.5 Q 96.1 499.4, 96.1 497.3 M 100.0 500.7 Q 101.3 500.7, 102.0 499.9 Q 102.7 499.0, 102.7 497.3 Q 102.7 495.6, 102.0 494.8 Q 101.3 493.9, 100.0 493.9 Q 98.7 493.9, 98.0 494.8 Q 97.3 495.6, 97.3 497.3 Q 97.3 499.0, 98.0 499.9 Q 98.7 500.7, 100.0 500.7 \" fill=\"#FF0000\"/>\n",
"<path class=\"atom-14\" d=\"M 277.8 309.3 L 278.9 309.3 L 278.9 312.9 L 283.3 312.9 L 283.3 309.3 L 284.4 309.3 L 284.4 317.8 L 283.3 317.8 L 283.3 313.9 L 278.9 313.9 L 278.9 317.8 L 277.8 317.8 L 277.8 309.3 \" fill=\"#FF0000\"/>\n",
"<path class=\"atom-14\" d=\"M 286.2 313.6 Q 286.2 311.5, 287.2 310.4 Q 288.2 309.2, 290.1 309.2 Q 292.0 309.2, 293.0 310.4 Q 294.0 311.5, 294.0 313.6 Q 294.0 315.6, 293.0 316.8 Q 291.9 318.0, 290.1 318.0 Q 288.2 318.0, 287.2 316.8 Q 286.2 315.6, 286.2 313.6 M 290.1 317.0 Q 291.4 317.0, 292.1 316.1 Q 292.8 315.3, 292.8 313.6 Q 292.8 311.9, 292.1 311.0 Q 291.4 310.2, 290.1 310.2 Q 288.8 310.2, 288.1 311.0 Q 287.4 311.9, 287.4 313.6 Q 287.4 315.3, 288.1 316.1 Q 288.8 317.0, 290.1 317.0 \" fill=\"#FF0000\"/>\n",
"<path class=\"atom-16\" d=\"M 575.1 316.9 Q 575.1 314.8, 576.1 313.7 Q 577.1 312.5, 579.0 312.5 Q 580.8 312.5, 581.8 313.7 Q 582.9 314.8, 582.9 316.9 Q 582.9 318.9, 581.8 320.1 Q 580.8 321.3, 579.0 321.3 Q 577.1 321.3, 576.1 320.1 Q 575.1 318.9, 575.1 316.9 M 579.0 320.3 Q 580.2 320.3, 580.9 319.4 Q 581.7 318.6, 581.7 316.9 Q 581.7 315.2, 580.9 314.3 Q 580.2 313.5, 579.0 313.5 Q 577.7 313.5, 576.9 314.3 Q 576.3 315.2, 576.3 316.9 Q 576.3 318.6, 576.9 319.4 Q 577.7 320.3, 579.0 320.3 \" fill=\"#FF0000\"/>\n",
"<path class=\"atom-16\" d=\"M 584.2 312.6 L 585.3 312.6 L 585.3 316.2 L 589.7 316.2 L 589.7 312.6 L 590.8 312.6 L 590.8 321.1 L 589.7 321.1 L 589.7 317.2 L 585.3 317.2 L 585.3 321.1 L 584.2 321.1 L 584.2 312.6 \" fill=\"#FF0000\"/>\n",
"<path class=\"atom-18\" d=\"M 1104.5 621.7 Q 1104.5 619.7, 1105.5 618.5 Q 1106.5 617.4, 1108.4 617.4 Q 1110.2 617.4, 1111.3 618.5 Q 1112.3 619.7, 1112.3 621.7 Q 1112.3 623.8, 1111.2 624.9 Q 1110.2 626.1, 1108.4 626.1 Q 1106.5 626.1, 1105.5 624.9 Q 1104.5 623.8, 1104.5 621.7 M 1108.4 625.1 Q 1109.7 625.1, 1110.4 624.3 Q 1111.1 623.4, 1111.1 621.7 Q 1111.1 620.0, 1110.4 619.2 Q 1109.7 618.3, 1108.4 618.3 Q 1107.1 618.3, 1106.4 619.2 Q 1105.7 620.0, 1105.7 621.7 Q 1105.7 623.4, 1106.4 624.3 Q 1107.1 625.1, 1108.4 625.1 \" fill=\"#FF0000\"/>\n",
"<path class=\"atom-19\" d=\"M 1015.3 406.3 L 1018.1 410.8 Q 1018.4 411.2, 1018.8 412.0 Q 1019.3 412.8, 1019.3 412.9 L 1019.3 406.3 L 1020.4 406.3 L 1020.4 414.8 L 1019.3 414.8 L 1016.3 409.8 Q 1015.9 409.3, 1015.6 408.6 Q 1015.2 407.9, 1015.1 407.7 L 1015.1 414.8 L 1014.0 414.8 L 1014.0 406.3 L 1015.3 406.3 \" fill=\"#0000FF\"/>\n",
"<path class=\"atom-19\" d=\"M 1022.1 406.3 L 1023.2 406.3 L 1023.2 409.9 L 1027.6 409.9 L 1027.6 406.3 L 1028.7 406.3 L 1028.7 414.8 L 1027.6 414.8 L 1027.6 410.8 L 1023.2 410.8 L 1023.2 414.8 L 1022.1 414.8 L 1022.1 406.3 \" fill=\"#0000FF\"/>\n",
"</svg>"
],
"text/plain": [
"<IPython.core.display.SVG object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"def save_aniline_screening_results(df, output_dir, visualization_dir, max_visualizations=50):\n",
" \"\"\"保存芳香胺筛选结果\"\"\"\n",
" \n",
" # 保存CSV文件\n",
" csv_path = output_dir / \"aniline_candidates.csv\"\n",
" \n",
" # 转换ROMol列为SMILES因为ROMol对象无法保存到CSV\n",
" df_export = df.copy()\n",
" if 'ROMol' in df_export.columns:\n",
" df_export['SMILES_from_mol'] = df_export['ROMol'].apply(lambda x: Chem.MolToSmiles(x) if x else '')\n",
" df_export = df_export.drop('ROMol', axis=1)\n",
" \n",
" df_export.to_csv(csv_path, index=False, encoding='utf-8')\n",
" print(f\"CSV结果已保存到{csv_path}\")\n",
" print(f\"包含 {len(df_export)} 个分子,{len(df_export.columns)} 个属性列\")\n",
" \n",
" # 生成可视化图片\n",
" print(f\"\\n开始生成可视化图片最多{max_visualizations}个)...\")\n",
" generated_count = 0\n",
" \n",
" for idx, row in df.iterrows():\n",
" if generated_count >= max_visualizations:\n",
" print(f\"已达到最大可视化数量限制 ({max_visualizations}),停止生成\")\n",
" break\n",
" \n",
" cas = str(row.get('CAS', 'unknown')).strip()\n",
" name = str(row.get('Name', 'unknown')).strip()\n",
" \n",
" # 清理文件名(去除特殊字符)\n",
" safe_name = \"\".join(c for c in name if c.isalnum() or c in (' ', '-', '_')).rstrip()\n",
" safe_cas = \"\".join(c for c in cas if c.isalnum() or c in ('-',)).rstrip()\n",
" \n",
" # 跳过无效的标识符\n",
" if not safe_cas or safe_cas == 'nan' or safe_cas == 'unknown':\n",
" continue\n",
" \n",
" mol = row.get('ROMol')\n",
" if mol is None:\n",
" continue\n",
" \n",
" matched_atoms = row.get('matched_atoms', [])\n",
" if not matched_atoms:\n",
" continue\n",
" \n",
" # 生成文件名和标题\n",
" filename = visualization_dir / f\"{safe_cas}_{safe_name.replace(' ', '_')}.svg\"\n",
" title = f\"{name} ({cas}) - 芳香胺结构\"\n",
" \n",
" try:\n",
" # 生成SVG\n",
" svg_content = generate_highlighted_svg(mol, matched_atoms, filename, title)\n",
" generated_count += 1\n",
" \n",
" # 每10个显示一次进度\n",
" if generated_count % 10 == 0:\n",
" print(f\"已生成 {generated_count} 个分子图片\")\n",
" \n",
" except Exception as e:\n",
" print(f\"生成 {safe_cas} 失败: {e}\")\n",
" continue\n",
" \n",
" print(f\"完成!共生成 {generated_count} 个可视化图片\")\n",
" return csv_path, generated_count\n",
"\n",
"# 保存结果\n",
"if len(matched_df) > 0:\n",
" csv_path, viz_count = save_aniline_screening_results(\n",
" matched_df, output_dir, visualization_dir, max_visualizations=50\n",
" )\n",
" \n",
" # 显示第一个生成的图片作为示例\n",
" if viz_count > 0:\n",
" example_files = list(visualization_dir.glob(\"*.svg\"))\n",
" if example_files:\n",
" example_file = example_files[0]\n",
" print(f\"\\n示例图片: {example_file.name}\")\n",
" with open(example_file, \"r\") as f:\n",
" svg_content = f.read()\n",
" display(SVG(svg_content))\n",
"else:\n",
" print(\"没有匹配结果,无需保存\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 结果统计和分析\n",
"\n",
"### 筛选统计\n",
"- 总分子数\n",
"- 匹配分子数\n",
"- 可视化文件数量\n",
"- 输出文件位置"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"execution": {
"iopub.execute_input": "2025-11-11T13:21:36.282118Z",
"iopub.status.busy": "2025-11-11T13:21:36.281886Z",
"iopub.status.idle": "2025-11-11T13:21:36.317857Z",
"shell.execute_reply": "2025-11-11T13:21:36.316621Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=== 芳香胺筛选结果统计 ===\n",
"总分子数3276\n",
"匹配分子数78\n",
"匹配率2.38%\n",
"\n",
"输出目录:../data/drug_targetmol/aniline_candidates\n",
"CSV文件../data/drug_targetmol/aniline_candidates/aniline_candidates.csv\n",
"可视化目录:../data/drug_targetmol/aniline_candidates/visualizations\n",
"SVG文件数量50\n",
"\n",
"匹配数量最多的分子:\n",
" Name CAS total_matches\n",
"432 Proflavine Hemisulfate 1811-28-5 4\n",
"335 Pemetrexed disodium hemipenta hydrate 357166-30-4 2\n",
"463 Lamotrigine 84057-84-1 2\n",
"779 Pyrimethamine 58-14-0 2\n",
"784 Dapsone 80-08-0 2\n"
]
}
],
"source": [
"# 结果统计\n",
"print(\"=== 芳香胺筛选结果统计 ===\")\n",
"print(f\"总分子数:{len(df)}\")\n",
"print(f\"匹配分子数:{len(matched_df)}\")\n",
"print(f\"匹配率:{len(matched_df)/len(df)*100:.2f}%\")\n",
"print(f\"\\n输出目录{output_dir}\")\n",
"print(f\"CSV文件{output_dir}/aniline_candidates.csv\")\n",
"print(f\"可视化目录:{visualization_dir}\")\n",
"print(f\"SVG文件数量{len(list(visualization_dir.glob('*.svg')))}\")\n",
"\n",
"# 显示匹配最多的前几个分子\n",
"if len(matched_df) > 0:\n",
" print(\"\\n匹配数量最多的分子\")\n",
" top_matches = matched_df.nlargest(5, 'total_matches')[['Name', 'CAS', 'total_matches']]\n",
" print(top_matches)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 使用建议\n",
"\n",
"### 筛选结果解读\n",
"- **匹配分子**包含芳香胺结构Ar-NH₂的药物\n",
"- **蓝色高亮**匹配的SMARTS结构芳香碳/氮 + 氨基)\n",
"- **多重匹配**:分子中可能存在多个芳香胺基团\n",
"\n",
"### 后续分析建议\n",
"1. **合成路线验证**:查阅匹配分子的合成文献\n",
"2. **Sandmeyer反应确认**确认是否使用Sandmeyer反应引入卤素\n",
"3. **张夏恒反应评估**评估替代Sandmeyer反应的可行性\n",
"4. **工艺优化潜力**:分析替换为张夏恒反应的经济效益\n",
"\n",
"### 文件说明\n",
"- **CSV文件**:完整的分子属性和匹配信息\n",
"- **SVG文件**:结构可视化,蓝色高亮芳香胺结构\n",
"- **命名规则**{CAS}_{Name}.svg特殊字符已清理"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.14.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}