macro_split/notebooks/screen_aniline_candidates.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 筛选芳香胺候选药物 - Sandmeyer反应起始物分析\n",
    "\n",
    "## 背景介绍\n",
    "\n",
    "### Sandmeyer反应回顾\n",
    "Sandmeyer反应是经典的芳香胺转化方法：\n",
    "**Ar-NH₂ → [Ar-N₂]⁺ → Ar-X**\n",
    "其中 X = Cl, Br, I, CN, OH, SCN 等\n",
    "\n",
    "### 筛选目标\n",
    "通过识别药物分子中含有芳香胺结构（Ar-NH₂）的化合物，\n",
    "找出可能作为Sandmeyer反应起始物的候选药物。\n",
    "这些分子可能原本通过Sandmeyer反应引入芳香卤素，\n",
    "现在可以用张夏恒反应进行更高效的转化。\n",
    "\n",
    "### SMARTS模式\n",
    "使用SMARTS模式 `[c,n][NH2]` 匹配：\n",
    "- `[c,n]`: 芳香碳或氮原子\n",
    "- `[NH2]`: 氨基（-NH₂）\n",
    "\n",
    "**重要提醒：**\n",
    "- 此筛选基于分子结构特征\n",
    "- 最终需要查阅文献确认合成路线\n",
    "- 并非所有含芳香胺的药物都使用Sandmeyer反应"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 导入所需库"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from pathlib import Path\n",
    "from rdkit import Chem\n",
    "from rdkit.Chem import PandasTools, Draw\n",
    "from rdkit.Chem.Draw import rdMolDraw2D\n",
    "from IPython.display import SVG, display\n",
    "from rdkit.Chem import AllChem\n",
    "import pandas as pd\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "# 设置显示选项\n",
    "pd.set_option('display.max_columns', None)\n",
    "pd.set_option('display.max_colwidth', 100)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 定义筛选模式和可视化函数\n",
    "\n",
    "### SMARTS模式设置\n",
    "- **目标模式**: `[c,n][NH2]` - 芳香碳/氮原子连接的氨基\n",
    "- **匹配逻辑**: 寻找所有包含此子结构的分子"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "使用SMARTS模式: [c,n][NH2]\n",
      "模式验证: ✓\n",
      "\n",
      "创建目录：../data/drug_targetmol/aniline_candidates\n",
      "创建可视化目录：../data/drug_targetmol/aniline_candidates/visualizations\n"
     ]
    }
   ],
   "source": [
    "# 定义筛选模式\n",
    "TARGET_SMARTS = '[c,n][NH2]'\n",
    "pattern = Chem.MolFromSmarts(TARGET_SMARTS)\n",
    "\n",
    "if pattern is None:\n",
    "    raise ValueError(f\"无效的SMARTS模式: {TARGET_SMARTS}\")\n",
    "\n",
    "print(f\"使用SMARTS模式: {TARGET_SMARTS}\")\n",
    "print(f\"模式验证: {'✓' if pattern else '✗'}\")\n",
    "\n",
    "# 创建输出目录\n",
    "output_base = Path(\"../data/drug_targetmol\")\n",
    "output_dir = output_base / \"aniline_candidates\"\n",
    "visualization_dir = output_dir / \"visualizations\"\n",
    "\n",
    "output_dir.mkdir(exist_ok=True)\n",
    "visualization_dir.mkdir(exist_ok=True)\n",
    "\n",
    "print(f\"\\n创建目录：{output_dir}\")\n",
    "print(f\"创建可视化目录：{visualization_dir}\")\n",
    "\n",
    "def generate_highlighted_svg(mol, highlight_atoms, filename, title=\"\"):\n",
    "    \"\"\"生成高亮匹配结构的高清晰度SVG图片\"\"\"\n",
    "    # 计算2D坐标\n",
    "    AllChem.Compute2DCoords(mol)\n",
    "    \n",
    "    # 创建SVG绘制器\n",
    "    drawer = rdMolDraw2D.MolDraw2DSVG(1200, 900)  # 更大的尺寸以提高清晰度\n",
    "    drawer.SetFontSize(12)\n",
    "    \n",
    "    # 绘制选项\n",
    "    draw_options = drawer.drawOptions()\n",
    "    draw_options.addAtomIndices = False  # 不显示原子索引，保持简洁\n",
    "    draw_options.addBondIndices = False\n",
    "    draw_options.addStereoAnnotation = True\n",
    "    draw_options.fixedFontSize = 12\n",
    "    \n",
    "    # 高亮匹配的原子（蓝色）\n",
    "    atom_colors = {}\n",
    "    for atom_idx in highlight_atoms:\n",
    "        atom_colors[atom_idx] = (0.3, 0.3, 1.0)  # 蓝色高亮\n",
    "    \n",
    "    # 绘制分子\n",
    "    drawer.DrawMolecule(mol, \n",
    "                       highlightAtoms=highlight_atoms,\n",
    "                       highlightAtomColors=atom_colors)\n",
    "    \n",
    "    drawer.FinishDrawing()\n",
    "    svg_content = drawer.GetDrawingText()\n",
    "    \n",
    "    # 添加标题\n",
    "    if title:\n",
    "        # 在SVG中添加标题\n",
    "        svg_lines = svg_content.split(\"\\\\n\")\n",
    "        # 在<g>标签前插入标题\n",
    "        for i, line in enumerate(svg_lines):\n",
    "            if \"<g \" in line and \"transform\" in line:\n",
    "                svg_lines.insert(i, f\"<text x=\\\"50%\\\" y=\\\"30\\\" text-anchor=\\\"middle\\\" font-size=\\\"16\\\" font-weight=\\\"bold\\\">{title}</text>\")\n",
    "                break\n",
    "        svg_with_title = \"\\\\n\".join(svg_lines)\n",
    "    else:\n",
    "        svg_with_title = svg_content\n",
    "    \n",
    "    # 保存文件\n",
    "    with open(filename, \"w\") as f:\n",
    "        f.write(svg_with_title)\n",
    "    \n",
    "    return svg_content"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据加载和分子筛选\n",
    "\n",
    "### 数据源\n",
    "- 文件位置：`data/drug_targetmol/0c04ffc9fe8c2ec916412fbdc2a49bf4.sdf`\n",
    "- 包含药物分子结构和丰富属性信息\n",
    "\n",
    "### 筛选逻辑\n",
    "1. 读取SDF文件\n",
    "2. 对每个分子进行SMARTS匹配\n",
    "3. 记录匹配的原子和匹配数量\n",
    "4. 保存匹配结果到CSV\n",
    "5. 生成高亮可视化图片"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "正在读取SDF文件...\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[21:24:23] Both bonds on one end of an atropisomer are on the same side - atoms is : 3\n",
      "[21:24:23] Explicit valence for atom # 2 N greater than permitted\n",
      "[21:24:23] ERROR: Could not sanitize molecule ending on line 217340\n",
      "[21:24:23] ERROR: Explicit valence for atom # 2 N greater than permitted\n",
      "[21:24:24] Explicit valence for atom # 4 N greater than permitted\n",
      "[21:24:24] ERROR: Could not sanitize molecule ending on line 317283\n",
      "[21:24:24] ERROR: Explicit valence for atom # 4 N greater than permitted\n",
      "[21:24:24] Explicit valence for atom # 4 N greater than permitted\n",
      "[21:24:24] ERROR: Could not sanitize molecule ending on line 324666\n",
      "[21:24:24] ERROR: Explicit valence for atom # 4 N greater than permitted\n",
      "[21:24:24] Explicit valence for atom # 5 N greater than permitted\n",
      "[21:24:24] ERROR: Could not sanitize molecule ending on line 365883\n",
      "[21:24:24] ERROR: Explicit valence for atom # 5 N greater than permitted\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "成功加载 3276 个分子\n",
      "\n",
      "数据概览：\n",
      "  Index    Plate Row Col ID                           Name  \\\n",
      "0     1  L1010-1   a   2                     Dexamethasone   \n",
      "1     2  L1010-1   a   3                         Danicopan   \n",
      "2     3  L1010-1   a   4                     Cyclosporin A   \n",
      "3     4  L1010-1   a   5                       L-Carnitine   \n",
      "4     5  L1010-1   a   6     Trimetazidine dihydrochloride   \n",
      "\n",
      "                                       Synonyms           CAS  \\\n",
      "0  MK 125;Prednisolone F;NSC 34521;Hexadecadrol       50-02-2   \n",
      "1                                      ACH-4471  1903768-17-1   \n",
      "2       Cyclosporine A;Ciclosporin;Cyclosporine    59865-13-3   \n",
      "3                  L(-)-Carnitine;Levocarnitine      541-15-1   \n",
      "4               Yoshimilon;Kyurinett;Vastarel F    13171-25-0   \n",
      "\n",
      "                                                                                                SMILES  \\\n",
      "0              C[C@@H]1C[C@H]2[C@@H]3CCC4=CC(=O)C=C[C@]4(C)[C@@]3(F)[C@@H](O)C[C@]2(C)[C@@]1(O)C(=O)CO   \n",
      "1                        CC(=O)c1nn(CC(=O)N2C[C@H](F)C[C@H]2C(=O)Nc2cccc(Br)n2)c2ccc(cc12)-c1cnc(C)nc1   \n",
      "2  [C@H]([C@@H](C/C=C/C)C)(O)[C@@]1(N(C)C(=O)[C@H]([C@@H](C)C)N(C)C(=O)[C@H](CC(C)C)N(C)C(=O)[C@H](...   \n",
      "3                                                                      C[N+](C)(C)C[C@@H](O)CC([O-])=O   \n",
      "4                                                               Cl.Cl.COC1=C(OC)C(OC)=C(CN2CCNCC2)C=C1   \n",
      "\n",
      "         Formula    MolWt Approved status  \\\n",
      "0      C22H29FO5   392.46    NMPA;EMA;FDA   \n",
      "1  C26H23BrFN7O3   580.41             FDA   \n",
      "2  C62H111N11O12  1202.61             FDA   \n",
      "3       C7H15NO3    161.2             FDA   \n",
      "4  C14H24Cl2N2O3  339.258        NMPA;EMA   \n",
      "\n",
      "                                                             Pharmacopoeia  \\\n",
      "0                                            USP39-NF34;BP2015;JP16;IP2010   \n",
      "1                                                                      NaN   \n",
      "2  Martindale the Extra Pharmacopoei, EP10.2, USP43-NF38, Ph.Int_6th, JP17   \n",
      "3                                                                      NaN   \n",
      "4         BP2019;KP Ⅹ;EP9.2;IP2010;JP17;Martindale:The Extra Pharmacopoeia   \n",
      "\n",
      "                 Disease  \\\n",
      "0             Metabolism   \n",
      "1                 Others   \n",
      "2          Immune system   \n",
      "3  Cardiovascular system   \n",
      "4  Cardiovascular system   \n",
      "\n",
      "                                                                                              Pathways  \\\n",
      "0  Antibody-drug Conjugate/ADC Related;Autophagy;Endocrinology/Hormones;Immunology/Inflammation;Mic...   \n",
      "1                                                                              Immunology/Inflammation   \n",
      "2                                             Immunology/Inflammation;Metabolism;Microbiology/Virology   \n",
      "3                                                                                           Metabolism   \n",
      "4                                                                                 Autophagy;Metabolism   \n",
      "\n",
      "                                                                                                Target  \\\n",
      "0  Antibacterial;Antibiotic;Autophagy;Complement System;Glucocorticoid Receptor;IL Receptor;Mitopha...   \n",
      "1                                                                                    Complement System   \n",
      "2                                                             Phosphatase;Antibiotic;Complement System   \n",
      "3                                                            Endogenous Metabolite;Fatty Acid Synthase   \n",
      "4                                                                        Autophagy;Fatty Acid Synthase   \n",
      "\n",
      "                                                                                              Receptor  \\\n",
      "0  Antibiotic; Autophagy; Bacterial; Complement System; Glucocorticoid Receptor; IL receptor; Mitop...   \n",
      "1                                                                          Complement System; factor D   \n",
      "2                                  Antibiotic; calcineurin phosphatase; Complement System; Phosphatase   \n",
      "3                                                                           Endogenous Metabolite; FAS   \n",
      "4                                              Autophagy; mitochondrial long-chain 3-ketoacyl thiolase   \n",
      "\n",
      "                                                                                           Bioactivity  \\\n",
      "0  Dexamethasone is a glucocorticoid receptor agonist and IL receptor modulator with anti-inflammat...   \n",
      "1  Danicopan (ACH-4471) (ACH-4471) is a selective, orally active small molecule factor D inhibitor ...   \n",
      "2  Cyclosporin A is a natural product and an active fungal metabolite, classified as a cyclic polyp...   \n",
      "3  L-Carnitine (L(-)-Carnitine) is an amino acid derivative. L-Carnitine facilitates long-chain fat...   \n",
      "4  Trimetazidine dihydrochloride (Vastarel F) can improve myocardial glucose utilization by inhibit...   \n",
      "\n",
      "                                                                                             Reference  \\\n",
      "0  Li M, Yu H. Identification of WP1066, an inhibitor of JAK2 and STAT3, as a Kv1. 3 potassium chan...   \n",
      "1  Yuan X, et al. Small-molecule factor D inhibitors selectively block the alternative pathway of c...   \n",
      "2  D'Angelo G, et al. Cyclosporin A prevents the hypoxic adaptation by activating hypoxia-inducible...   \n",
      "3                                                    Jogl G, Tong L. Cell. 2003 Jan 10; 112(1):113-22.   \n",
      "4  Yang Q, et al. Int J Clin Exp Pathol. 2015, 8(4):3735-3741.;Liu Z, et al. Metabolism. 2016, 65(3...   \n",
      "\n",
      "                                              ROMol  \n",
      "0  <rdkit.Chem.rdchem.Mol object at 0x77530d73c820>  \n",
      "1  <rdkit.Chem.rdchem.Mol object at 0x77530d73c890>  \n",
      "2  <rdkit.Chem.rdchem.Mol object at 0x77530a3f6f10>  \n",
      "3  <rdkit.Chem.rdchem.Mol object at 0x77530a3f70d0>  \n",
      "4  <rdkit.Chem.rdchem.Mol object at 0x77530a3f7140>  \n",
      "\n",
      "列名：['Index', 'Plate', 'Row', 'Col', 'ID', 'Name', 'Synonyms', 'CAS', 'SMILES', 'Formula', 'MolWt', 'Approved status', 'Pharmacopoeia', 'Disease', 'Pathways', 'Target', 'Receptor', 'Bioactivity', 'Reference', 'ROMol']\n"
     ]
    }
   ],
   "source": [
    "# 读取SDF文件\n",
    "sdf_path = '../data/drug_targetmol/0c04ffc9fe8c2ec916412fbdc2a49bf4.sdf'\n",
    "\n",
    "print(\"正在读取SDF文件...\")\n",
    "df = PandasTools.LoadSDF(sdf_path)\n",
    "print(f\"成功加载 {len(df)} 个分子\")\n",
    "\n",
    "# 显示数据基本信息\n",
    "print(\"\\n数据概览：\")\n",
    "print(df.head())\n",
    "print(f\"\\n列名：{list(df.columns)}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "开始筛选芳香胺结构...\n",
      "SMARTS模式: [c,n][N&H2]\n",
      "找到 262 个匹配分子（处理了 3276 个分子）\n",
      "\n",
      "筛选结果摘要：\n",
      "                  Name          CAS      Formula  total_matches\n",
      "17           Guanosine     118-00-3   C10H13N5O5              1\n",
      "20         Ganciclovir   82410-32-0    C9H13N5O4              1\n",
      "22   Imiquimod maleate  896106-16-4   C18H20N4O4              1\n",
      "27       Brincidofovir  444805-28-1  C27H52N3O7P              1\n",
      "28           Imiquimod   99011-02-6     C14H16N4              1\n",
      "32  Ganciclovir sodium  107910-75-8  C9H13N5NaO4              1\n",
      "33          Cytarabine     147-94-4    C9H13N3O5              1\n",
      "35          Vidarabine    5536-17-4   C10H13N5O4              1\n",
      "38         Penciclovir   39809-25-1   C10H15N5O3              1\n",
      "41         Famciclovir  104227-87-4   C14H19N5O4              1\n",
      "... 还有 252 个分子\n"
     ]
    }
   ],
   "source": [
    "def screen_molecules_for_aniline(df, smarts_pattern, max_molecules=100):\n",
    "    \"\"\"\n",
    "    筛选包含芳香胺结构的分子\n",
    "    \n",
    "    Args:\n",
    "        df: 包含分子的DataFrame\n",
    "        smarts_pattern: RDKit SMARTS模式对象\n",
    "        max_molecules: 最大处理分子数量\n",
    "    \n",
    "    Returns:\n",
    "        筛选结果DataFrame\n",
    "    \"\"\"\n",
    "    print(f\"开始筛选芳香胺结构...\")\n",
    "    print(f\"SMARTS模式: {Chem.MolToSmarts(smarts_pattern)}\")\n",
    "    \n",
    "    matched_molecules = []\n",
    "    processed_count = 0\n",
    "    \n",
    "    for idx, row in df.iterrows():\n",
    "        if processed_count >= max_molecules:\n",
    "            break\n",
    "            \n",
    "        mol = row['ROMol']\n",
    "        if mol is None:\n",
    "            continue\n",
    "            \n",
    "        processed_count += 1\n",
    "        \n",
    "        # 检查是否匹配SMARTS模式\n",
    "        if mol.HasSubstructMatch(smarts_pattern):\n",
    "            matches = mol.GetSubstructMatches(smarts_pattern)\n",
    "            \n",
    "            # 收集所有匹配的原子\n",
    "            matched_atoms = set()\n",
    "            for match in matches:\n",
    "                matched_atoms.update(match)\n",
    "            \n",
    "            # 创建匹配记录\n",
    "            match_record = row.copy()\n",
    "            match_record['matched_atoms'] = list(matched_atoms)\n",
    "            match_record['total_matches'] = len(matches)\n",
    "            match_record['smarts_pattern'] = Chem.MolToSmarts(smarts_pattern)\n",
    "            matched_molecules.append(match_record)\n",
    "    \n",
    "    result_df = pd.DataFrame(matched_molecules)\n",
    "    print(f\"找到 {len(result_df)} 个匹配分子（处理了 {processed_count} 个分子）\")\n",
    "    \n",
    "    return result_df\n",
    "\n",
    "# 执行筛选\n",
    "matched_df = screen_molecules_for_aniline(df, pattern, max_molecules=1000000)\n",
    "\n",
    "# 显示结果摘要\n",
    "if len(matched_df) > 0:\n",
    "    print(\"\\n筛选结果摘要：\")\n",
    "    summary_cols = ['Name', 'CAS', 'Formula', 'total_matches']\n",
    "    if len(matched_df) <= 10:\n",
    "        print(matched_df[summary_cols])\n",
    "    else:\n",
    "        print(matched_df[summary_cols].head(10))\n",
    "        print(f\"... 还有 {len(matched_df) - 10} 个分子\")\n",
    "else:\n",
    "    print(\"\\n未找到匹配分子\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 保存筛选结果\n",
    "\n",
    "### 输出文件\n",
    "1. **CSV文件**：包含所有匹配分子的属性信息和匹配详情\n",
    "2. **SVG图片**：每个匹配分子的结构可视化，高亮芳香胺结构"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CSV结果已保存到：../data/drug_targetmol/aniline_candidates/aniline_candidates.csv\n",
      "包含 262 个分子，23 个属性列\n",
      "\n",
      "开始生成可视化图片（最多500个）...\n",
      "已生成 10 个分子图片\n",
      "已生成 20 个分子图片\n",
      "已生成 30 个分子图片\n",
      "已生成 40 个分子图片\n",
      "已生成 50 个分子图片\n",
      "已生成 60 个分子图片\n",
      "已生成 70 个分子图片\n",
      "已生成 80 个分子图片\n",
      "已生成 90 个分子图片\n",
      "已生成 100 个分子图片\n",
      "已生成 110 个分子图片\n",
      "已生成 120 个分子图片\n",
      "已生成 130 个分子图片\n",
      "已生成 140 个分子图片\n",
      "已生成 150 个分子图片\n",
      "已生成 160 个分子图片\n",
      "已生成 170 个分子图片\n",
      "已生成 180 个分子图片\n",
      "已生成 190 个分子图片\n",
      "已生成 200 个分子图片\n",
      "已生成 210 个分子图片\n",
      "已生成 220 个分子图片\n",
      "已生成 230 个分子图片\n",
      "已生成 240 个分子图片\n",
      "已生成 250 个分子图片\n",
      "已生成 260 个分子图片\n",
      "完成！共生成 262 个可视化图片\n",
      "\n",
      "示例图片: 118-00-3_Guanosine.svg\n"
     ]
    },
    {
     "data": {
      "image/svg+xml": [
       "<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:rdkit=\"http://www.rdkit.org/xml\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" version=\"1.1\" baseProfile=\"full\" xml:space=\"preserve\" width=\"1200px\" height=\"900px\" viewBox=\"0 0 1200 900\">\n",
       "<!-- END OF HEADER -->\n",
       "<rect style=\"opacity:1.0;fill:#FFFFFF;stroke:none\" width=\"1200.0\" height=\"900.0\" x=\"0.0\" y=\"0.0\"> </rect>\n",
       "<path class=\"bond-0 atom-0 atom-1\" d=\"M 912.0,197.7 L 940.1,201.0 L 924.8,332.9 L 896.6,329.6 Z\" style=\"fill:#4C4CFF;fill-rule:evenodd;fill-opacity:1;stroke:#4C4CFF;stroke-width:0.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:10;stroke-opacity:1;\"/>\n",
       "<ellipse cx=\"932.9\" cy=\"201.5\" rx=\"26.6\" ry=\"26.6\" class=\"atom-0\" style=\"fill:#4C4CFF;fill-rule:evenodd;stroke:#4C4CFF;stroke-width:1.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<ellipse cx=\"910.7\" cy=\"331.2\" rx=\"26.6\" ry=\"26.6\" class=\"atom-1\" style=\"fill:#4C4CFF;fill-rule:evenodd;stroke:#4C4CFF;stroke-width:1.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-0 atom-0 atom-1\" d=\"M 925.1,208.0 L 910.7,331.2\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-1 atom-1 atom-2\" d=\"M 910.7,331.2 L 853.5,355.9\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-1 atom-1 atom-2\" d=\"M 853.5,355.9 L 796.4,380.6\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-1 atom-1 atom-2\" d=\"M 908.0,354.1 L 856.2,376.5\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-1 atom-1 atom-2\" d=\"M 856.2,376.5 L 804.3,398.9\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-2 atom-2 atom-3\" d=\"M 787.8,392.5 L 780.6,454.1\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-2 atom-2 atom-3\" d=\"M 780.6,454.1 L 773.4,515.8\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-3 atom-3 atom-4\" d=\"M 773.4,515.8 L 879.9,595.0\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-3 atom-3 atom-4\" d=\"M 794.5,506.6 L 882.6,572.2\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-4 atom-4 atom-5\" d=\"M 879.9,595.0 L 860.1,653.6\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-4 atom-4 atom-5\" d=\"M 860.1,653.6 L 840.4,712.2\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-5 atom-5 atom-6\" d=\"M 829.8,720.7 L 767.3,720.0\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-5 atom-5 atom-6\" d=\"M 767.3,720.0 L 704.7,719.3\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-5 atom-5 atom-6\" d=\"M 830.1,700.8 L 774.7,700.2\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-5 atom-5 atom-6\" d=\"M 774.7,700.2 L 719.4,699.6\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-6 atom-6 atom-7\" d=\"M 704.7,719.3 L 686.2,660.3\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-6 atom-6 atom-7\" d=\"M 686.2,660.3 L 667.8,601.2\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-7 atom-3 atom-7\" d=\"M 773.4,515.8 L 723.0,551.5\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-7 atom-3 atom-7\" d=\"M 723.0,551.5 L 672.7,587.2\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-8 atom-7 atom-8\" d=\"M 657.5,590.0 L 598.4,570.1\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-8 atom-7 atom-8\" d=\"M 598.4,570.1 L 539.3,550.1\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-9 atom-8 atom-9\" d=\"M 539.3,550.1 L 489.3,585.6\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-9 atom-8 atom-9\" d=\"M 489.3,585.6 L 439.2,621.0\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-10 atom-9 atom-10\" d=\"M 422.7,620.8 L 373.6,584.2\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-10 atom-9 atom-10\" d=\"M 373.6,584.2 L 324.4,547.7\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-11 atom-10 atom-11\" d=\"M 324.4,547.7 L 197.7,587.2\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-12 atom-11 atom-12\" d=\"M 197.7,587.2 L 153.0,546.1\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-12 atom-11 atom-12\" d=\"M 153.0,546.1 L 108.3,504.9\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-13 atom-10 atom-13\" d=\"M 324.4,547.7 L 366.9,421.8\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-14 atom-13 atom-14\" d=\"M 366.9,421.8 L 331.6,372.1\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-14 atom-13 atom-14\" d=\"M 331.6,372.1 L 296.3,322.3\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-15 atom-13 atom-15\" d=\"M 366.9,421.8 L 499.7,423.4\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-16 atom-8 atom-15\" d=\"M 539.3,550.1 L 499.7,423.4\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-17 atom-15 atom-16\" d=\"M 499.7,423.4 L 536.0,374.5\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-17 atom-15 atom-16\" d=\"M 536.0,374.5 L 572.4,325.6\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-18 atom-4 atom-17\" d=\"M 879.9,595.0 L 1001.8,542.4\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-19 atom-17 atom-18\" d=\"M 991.3,547.0 L 1042.7,585.2\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-19 atom-17 atom-18\" d=\"M 1042.7,585.2 L 1094.1,623.5\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-19 atom-17 atom-18\" d=\"M 1003.2,531.0 L 1054.6,569.3\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-19 atom-17 atom-18\" d=\"M 1054.6,569.3 L 1106.0,607.5\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-20 atom-17 atom-19\" d=\"M 1001.8,542.4 L 1009.0,480.8\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-20 atom-17 atom-19\" d=\"M 1009.0,480.8 L 1016.2,419.1\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-21 atom-1 atom-19\" d=\"M 910.7,331.2 L 960.2,368.0\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-21 atom-1 atom-19\" d=\"M 960.2,368.0 L 1009.6,404.9\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 707.8,719.4 L 704.7,719.3 L 703.7,716.4\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:10;stroke-opacity:1;\"/>\n",
       "<path d=\"M 204.0,585.3 L 197.7,587.2 L 195.5,585.2\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:10;stroke-opacity:1;\"/>\n",
       "<path d=\"M 995.7,545.0 L 1001.8,542.4 L 1002.2,539.3\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:10;stroke-opacity:1;\"/>\n",
       "<path class=\"atom-0\" d=\"M 924.2 195.1 L 927.0 199.6 Q 927.3 200.0, 927.7 200.8 Q 928.1 201.7, 928.2 201.7 L 928.2 195.1 L 929.3 195.1 L 929.3 203.6 L 928.1 203.6 L 925.1 198.7 Q 924.8 198.1, 924.4 197.4 Q 924.1 196.8, 924.0 196.6 L 924.0 203.6 L 922.9 203.6 L 922.9 195.1 L 924.2 195.1 \" fill=\"#000000\"/>\n",
       "<path class=\"atom-0\" d=\"M 930.9 195.1 L 932.1 195.1 L 932.1 198.7 L 936.4 198.7 L 936.4 195.1 L 937.6 195.1 L 937.6 203.6 L 936.4 203.6 L 936.4 199.7 L 932.1 199.7 L 932.1 203.6 L 930.9 203.6 L 930.9 195.1 \" fill=\"#000000\"/>\n",
       "<path class=\"atom-0\" d=\"M 939.2 203.3 Q 939.4 202.8, 939.9 202.5 Q 940.4 202.2, 941.1 202.2 Q 942.0 202.2, 942.4 202.6 Q 942.9 203.1, 942.9 203.9 Q 942.9 204.7, 942.3 205.5 Q 941.7 206.3, 940.4 207.2 L 943.0 207.2 L 943.0 207.8 L 939.2 207.8 L 939.2 207.3 Q 940.3 206.6, 940.9 206.0 Q 941.5 205.5, 941.8 205.0 Q 942.1 204.5, 942.1 203.9 Q 942.1 203.4, 941.8 203.1 Q 941.6 202.8, 941.1 202.8 Q 940.7 202.8, 940.4 203.0 Q 940.0 203.2, 939.8 203.6 L 939.2 203.3 \" fill=\"#000000\"/>\n",
       "<path class=\"atom-2\" d=\"M 786.9 379.6 L 789.7 384.1 Q 790.0 384.6, 790.4 385.4 Q 790.8 386.2, 790.9 386.2 L 790.9 379.6 L 792.0 379.6 L 792.0 388.1 L 790.8 388.1 L 787.8 383.2 Q 787.5 382.6, 787.1 382.0 Q 786.8 381.3, 786.7 381.1 L 786.7 388.1 L 785.6 388.1 L 785.6 379.6 L 786.9 379.6 \" fill=\"#0000FF\"/>\n",
       "<path class=\"atom-5\" d=\"M 835.6 716.6 L 838.4 721.1 Q 838.6 721.5, 839.1 722.3 Q 839.5 723.1, 839.5 723.2 L 839.5 716.6 L 840.7 716.6 L 840.7 725.1 L 839.5 725.1 L 836.5 720.2 Q 836.2 719.6, 835.8 718.9 Q 835.4 718.3, 835.3 718.1 L 835.3 725.1 L 834.2 725.1 L 834.2 716.6 L 835.6 716.6 \" fill=\"#0000FF\"/>\n",
       "<path class=\"atom-7\" d=\"M 663.2 588.3 L 666.0 592.8 Q 666.3 593.3, 666.7 594.1 Q 667.2 594.9, 667.2 594.9 L 667.2 588.3 L 668.3 588.3 L 668.3 596.8 L 667.1 596.8 L 664.2 591.9 Q 663.8 591.3, 663.4 590.7 Q 663.1 590.0, 663.0 589.8 L 663.0 596.8 L 661.9 596.8 L 661.9 588.3 L 663.2 588.3 \" fill=\"#0000FF\"/>\n",
       "<path class=\"atom-9\" d=\"M 427.1 626.9 Q 427.1 624.9, 428.1 623.8 Q 429.1 622.6, 431.0 622.6 Q 432.8 622.6, 433.9 623.8 Q 434.9 624.9, 434.9 626.9 Q 434.9 629.0, 433.8 630.2 Q 432.8 631.3, 431.0 631.3 Q 429.1 631.3, 428.1 630.2 Q 427.1 629.0, 427.1 626.9 M 431.0 630.4 Q 432.3 630.4, 433.0 629.5 Q 433.7 628.6, 433.7 626.9 Q 433.7 625.3, 433.0 624.4 Q 432.3 623.6, 431.0 623.6 Q 429.7 623.6, 429.0 624.4 Q 428.3 625.3, 428.3 626.9 Q 428.3 628.7, 429.0 629.5 Q 429.7 630.4, 431.0 630.4 \" fill=\"#FF0000\"/>\n",
       "<path class=\"atom-12\" d=\"M 87.7 493.1 L 88.9 493.1 L 88.9 496.7 L 93.2 496.7 L 93.2 493.1 L 94.4 493.1 L 94.4 501.6 L 93.2 501.6 L 93.2 497.6 L 88.9 497.6 L 88.9 501.6 L 87.7 501.6 L 87.7 493.1 \" fill=\"#FF0000\"/>\n",
       "<path class=\"atom-12\" d=\"M 96.1 497.3 Q 96.1 495.3, 97.1 494.1 Q 98.1 493.0, 100.0 493.0 Q 101.9 493.0, 102.9 494.1 Q 103.9 495.3, 103.9 497.3 Q 103.9 499.4, 102.9 500.5 Q 101.9 501.7, 100.0 501.7 Q 98.2 501.7, 97.1 500.5 Q 96.1 499.4, 96.1 497.3 M 100.0 500.7 Q 101.3 500.7, 102.0 499.9 Q 102.7 499.0, 102.7 497.3 Q 102.7 495.6, 102.0 494.8 Q 101.3 493.9, 100.0 493.9 Q 98.7 493.9, 98.0 494.8 Q 97.3 495.6, 97.3 497.3 Q 97.3 499.0, 98.0 499.9 Q 98.7 500.7, 100.0 500.7 \" fill=\"#FF0000\"/>\n",
       "<path class=\"atom-14\" d=\"M 277.8 309.3 L 278.9 309.3 L 278.9 312.9 L 283.3 312.9 L 283.3 309.3 L 284.4 309.3 L 284.4 317.8 L 283.3 317.8 L 283.3 313.9 L 278.9 313.9 L 278.9 317.8 L 277.8 317.8 L 277.8 309.3 \" fill=\"#FF0000\"/>\n",
       "<path class=\"atom-14\" d=\"M 286.2 313.6 Q 286.2 311.5, 287.2 310.4 Q 288.2 309.2, 290.1 309.2 Q 292.0 309.2, 293.0 310.4 Q 294.0 311.5, 294.0 313.6 Q 294.0 315.6, 293.0 316.8 Q 291.9 318.0, 290.1 318.0 Q 288.2 318.0, 287.2 316.8 Q 286.2 315.6, 286.2 313.6 M 290.1 317.0 Q 291.4 317.0, 292.1 316.1 Q 292.8 315.3, 292.8 313.6 Q 292.8 311.9, 292.1 311.0 Q 291.4 310.2, 290.1 310.2 Q 288.8 310.2, 288.1 311.0 Q 287.4 311.9, 287.4 313.6 Q 287.4 315.3, 288.1 316.1 Q 288.8 317.0, 290.1 317.0 \" fill=\"#FF0000\"/>\n",
       "<path class=\"atom-16\" d=\"M 575.1 316.9 Q 575.1 314.8, 576.1 313.7 Q 577.1 312.5, 579.0 312.5 Q 580.8 312.5, 581.8 313.7 Q 582.9 314.8, 582.9 316.9 Q 582.9 318.9, 581.8 320.1 Q 580.8 321.3, 579.0 321.3 Q 577.1 321.3, 576.1 320.1 Q 575.1 318.9, 575.1 316.9 M 579.0 320.3 Q 580.2 320.3, 580.9 319.4 Q 581.7 318.6, 581.7 316.9 Q 581.7 315.2, 580.9 314.3 Q 580.2 313.5, 579.0 313.5 Q 577.7 313.5, 576.9 314.3 Q 576.3 315.2, 576.3 316.9 Q 576.3 318.6, 576.9 319.4 Q 577.7 320.3, 579.0 320.3 \" fill=\"#FF0000\"/>\n",
       "<path class=\"atom-16\" d=\"M 584.2 312.6 L 585.3 312.6 L 585.3 316.2 L 589.7 316.2 L 589.7 312.6 L 590.8 312.6 L 590.8 321.1 L 589.7 321.1 L 589.7 317.2 L 585.3 317.2 L 585.3 321.1 L 584.2 321.1 L 584.2 312.6 \" fill=\"#FF0000\"/>\n",
       "<path class=\"atom-18\" d=\"M 1104.5 621.7 Q 1104.5 619.7, 1105.5 618.5 Q 1106.5 617.4, 1108.4 617.4 Q 1110.2 617.4, 1111.3 618.5 Q 1112.3 619.7, 1112.3 621.7 Q 1112.3 623.8, 1111.2 624.9 Q 1110.2 626.1, 1108.4 626.1 Q 1106.5 626.1, 1105.5 624.9 Q 1104.5 623.8, 1104.5 621.7 M 1108.4 625.1 Q 1109.7 625.1, 1110.4 624.3 Q 1111.1 623.4, 1111.1 621.7 Q 1111.1 620.0, 1110.4 619.2 Q 1109.7 618.3, 1108.4 618.3 Q 1107.1 618.3, 1106.4 619.2 Q 1105.7 620.0, 1105.7 621.7 Q 1105.7 623.4, 1106.4 624.3 Q 1107.1 625.1, 1108.4 625.1 \" fill=\"#FF0000\"/>\n",
       "<path class=\"atom-19\" d=\"M 1015.3 406.3 L 1018.1 410.8 Q 1018.4 411.2, 1018.8 412.0 Q 1019.3 412.8, 1019.3 412.9 L 1019.3 406.3 L 1020.4 406.3 L 1020.4 414.8 L 1019.3 414.8 L 1016.3 409.8 Q 1015.9 409.3, 1015.6 408.6 Q 1015.2 407.9, 1015.1 407.7 L 1015.1 414.8 L 1014.0 414.8 L 1014.0 406.3 L 1015.3 406.3 \" fill=\"#0000FF\"/>\n",
       "<path class=\"atom-19\" d=\"M 1022.1 406.3 L 1023.2 406.3 L 1023.2 409.9 L 1027.6 409.9 L 1027.6 406.3 L 1028.7 406.3 L 1028.7 414.8 L 1027.6 414.8 L 1027.6 410.8 L 1023.2 410.8 L 1023.2 414.8 L 1022.1 414.8 L 1022.1 406.3 \" fill=\"#0000FF\"/>\n",
       "</svg>"
      ],
      "text/plain": [
       "<IPython.core.display.SVG object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "def save_aniline_screening_results(df, output_dir, visualization_dir, max_visualizations=500):\n",
    "    \"\"\"保存芳香胺筛选结果\"\"\"\n",
    "    \n",
    "    # 保存CSV文件\n",
    "    csv_path = output_dir / \"aniline_candidates.csv\"\n",
    "    \n",
    "    # 转换ROMol列为SMILES（因为ROMol对象无法保存到CSV）\n",
    "    df_export = df.copy()\n",
    "    if 'ROMol' in df_export.columns:\n",
    "        df_export['SMILES_from_mol'] = df_export['ROMol'].apply(lambda x: Chem.MolToSmiles(x) if x else '')\n",
    "        df_export = df_export.drop('ROMol', axis=1)\n",
    "    \n",
    "    df_export.to_csv(csv_path, index=False, encoding='utf-8')\n",
    "    print(f\"CSV结果已保存到：{csv_path}\")\n",
    "    print(f\"包含 {len(df_export)} 个分子，{len(df_export.columns)} 个属性列\")\n",
    "    \n",
    "    # 生成可视化图片\n",
    "    print(f\"\\n开始生成可视化图片（最多{max_visualizations}个）...\")\n",
    "    generated_count = 0\n",
    "    \n",
    "    for idx, row in df.iterrows():\n",
    "        if generated_count >= max_visualizations:\n",
    "            print(f\"已达到最大可视化数量限制 ({max_visualizations})，停止生成\")\n",
    "            break\n",
    "            \n",
    "        cas = str(row.get('CAS', 'unknown')).strip()\n",
    "        name = str(row.get('Name', 'unknown')).strip()\n",
    "        \n",
    "        # 清理文件名（去除特殊字符）\n",
    "        safe_name = \"\".join(c for c in name if c.isalnum() or c in (' ', '-', '_')).rstrip()\n",
    "        safe_cas = \"\".join(c for c in cas if c.isalnum() or c in ('-',)).rstrip()\n",
    "        \n",
    "        # 跳过无效的标识符\n",
    "        if not safe_cas or safe_cas == 'nan' or safe_cas == 'unknown':\n",
    "            continue\n",
    "            \n",
    "        mol = row.get('ROMol')\n",
    "        if mol is None:\n",
    "            continue\n",
    "            \n",
    "        matched_atoms = row.get('matched_atoms', [])\n",
    "        if not matched_atoms:\n",
    "            continue\n",
    "            \n",
    "        # 生成文件名和标题\n",
    "        filename = visualization_dir / f\"{safe_cas}_{safe_name.replace(' ', '_')}.svg\"\n",
    "        title = f\"{name} ({cas}) - 芳香胺结构\"\n",
    "        \n",
    "        try:\n",
    "            # 生成SVG\n",
    "            svg_content = generate_highlighted_svg(mol, matched_atoms, filename, title)\n",
    "            generated_count += 1\n",
    "            \n",
    "            # 每10个显示一次进度\n",
    "            if generated_count % 10 == 0:\n",
    "                print(f\"已生成 {generated_count} 个分子图片\")\n",
    "                \n",
    "        except Exception as e:\n",
    "            print(f\"生成 {safe_cas} 失败: {e}\")\n",
    "            continue\n",
    "    \n",
    "    print(f\"完成！共生成 {generated_count} 个可视化图片\")\n",
    "    return csv_path, generated_count\n",
    "\n",
    "# 保存结果\n",
    "if len(matched_df) > 0:\n",
    "    csv_path, viz_count = save_aniline_screening_results(\n",
    "        matched_df, output_dir, visualization_dir, max_visualizations=500\n",
    "    )\n",
    "    \n",
    "    # 显示第一个生成的图片作为示例\n",
    "    if viz_count > 0:\n",
    "        example_files = list(visualization_dir.glob(\"*.svg\"))\n",
    "        if example_files:\n",
    "            example_file = example_files[0]\n",
    "            print(f\"\\n示例图片: {example_file.name}\")\n",
    "            with open(example_file, \"r\") as f:\n",
    "                svg_content = f.read()\n",
    "            display(SVG(svg_content))\n",
    "else:\n",
    "    print(\"没有匹配结果，无需保存\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 结果统计和分析\n",
    "\n",
    "### 筛选统计\n",
    "- 总分子数\n",
    "- 匹配分子数\n",
    "- 可视化文件数量\n",
    "- 输出文件位置"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== 芳香胺筛选结果统计 ===\n",
      "总分子数：3276\n",
      "匹配分子数：262\n",
      "匹配率：8.00%\n",
      "\n",
      "输出目录：../data/drug_targetmol/aniline_candidates\n",
      "CSV文件：../data/drug_targetmol/aniline_candidates/aniline_candidates.csv\n",
      "可视化目录：../data/drug_targetmol/aniline_candidates/visualizations\n",
      "SVG文件数量：262\n",
      "\n",
      "匹配数量最多的分子：\n",
      "                                       Name          CAS  total_matches\n",
      "432                  Proflavine Hemisulfate    1811-28-5              4\n",
      "1064                            Triamterene     396-01-0              3\n",
      "335   Pemetrexed disodium hemipenta hydrate  357166-30-4              2\n",
      "463                             Lamotrigine   84057-84-1              2\n",
      "779                           Pyrimethamine      58-14-0              2\n"
     ]
    }
   ],
   "source": [
    "# 结果统计\n",
    "print(\"=== 芳香胺筛选结果统计 ===\")\n",
    "print(f\"总分子数：{len(df)}\")\n",
    "print(f\"匹配分子数：{len(matched_df)}\")\n",
    "print(f\"匹配率：{len(matched_df)/len(df)*100:.2f}%\")\n",
    "print(f\"\\n输出目录：{output_dir}\")\n",
    "print(f\"CSV文件：{output_dir}/aniline_candidates.csv\")\n",
    "print(f\"可视化目录：{visualization_dir}\")\n",
    "print(f\"SVG文件数量：{len(list(visualization_dir.glob('*.svg')))}\")\n",
    "\n",
    "# 显示匹配最多的前几个分子\n",
    "if len(matched_df) > 0:\n",
    "    print(\"\\n匹配数量最多的分子：\")\n",
    "    top_matches = matched_df.nlargest(5, 'total_matches')[['Name', 'CAS', 'total_matches']]\n",
    "    print(top_matches)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 使用建议\n",
    "\n",
    "### 筛选结果解读\n",
    "- **匹配分子**：包含芳香胺结构（Ar-NH₂）的药物\n",
    "- **蓝色高亮**：匹配的SMARTS结构（芳香碳/氮 + 氨基）\n",
    "- **多重匹配**：分子中可能存在多个芳香胺基团\n",
    "\n",
    "### 后续分析建议\n",
    "1. **合成路线验证**：查阅匹配分子的合成文献\n",
    "2. **Sandmeyer反应确认**：确认是否使用Sandmeyer反应引入卤素\n",
    "3. **张夏恒反应评估**：评估替代Sandmeyer反应的可行性\n",
    "4. **工艺优化潜力**：分析替换为张夏恒反应的经济效益\n",
    "\n",
    "### 文件说明\n",
    "- **CSV文件**：完整的分子属性和匹配信息\n",
    "- **SVG文件**：结构可视化，蓝色高亮芳香胺结构\n",
    "- **命名规则**：{CAS}_{Name}.svg（特殊字符已清理）"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 抗生素筛选结果\n",
    "\n",
    "/home/zly/project/macro_split/data/drug_targetmol/aniline_candidates/antibiotics_identified.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.14.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}