{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## 可合成性性计算\n", "\n", "模型选择:\n", "\n", "[RDKit的SA分数](https://github.com/rdkit/rdkit/tree/master/Contrib/SA_Score):快速但基于规则。\n", "\n", "获取 sascorer 模块\n", "\n", "你可以从 RDKit GitHub 仓库的 Contrib 目录 下载 [sascorer.py](https://github.com/rdkit/rdkit/blob/master/Contrib/SA_Score/sascorer.py) 文件,然后将它放到你的工作目录或 Python 路径中\n", "\n", "SCScore:更准确但需额外依赖。" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "synthesis_df = pd.read_parquet(\"../../data/Macro16_SIME_Synthesis/synthesis_with_sa.parquet\")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "导入成功\n" ] }, { "data": { "text/html": [ "
| \n", " | original_smiles | \n", "smiles | \n", "status | \n", "message | \n", "synthesis | \n", "sa_score | \n", "
|---|---|---|---|---|---|---|
| 0 | \n", "O=C1C[C@@H](O)C[C@H](O[C@H]9C[C@@](C)(OC)[C@@H... | \n", "CCC1=C\\[C@H](O[C@H]2C[C@@](C)(OC)[C@@H](O)[C@H... | \n", "corrected | \n", "修复原子 28 | \n", "SIME | \n", "6.186130 | \n", "
| 1 | \n", "O=C1C[C@@H](O)C[C@H](O[C@H]9C[C@@](C)(OC)[C@@H... | \n", "CCC1=C\\[C@H](O[C@H]2C[C@@](C)(OC)[C@@H](O)[C@H... | \n", "corrected | \n", "修复原子 28 | \n", "SIME | \n", "6.200902 | \n", "
| 2 | \n", "O=C1C[C@@H](O)C[C@H](O[C@H]9C[C@@](C)(OC)[C@@H... | \n", "CCC1=C\\[C@H](O[C@H]2C[C@@](C)(OC)[C@@H](O)[C@H... | \n", "corrected | \n", "修复原子 28 | \n", "SIME | \n", "6.203467 | \n", "
| 3 | \n", "O=C1C[C@@H](O)C[C@H](O[C@H]9C[C@@](C)(OC)[C@@H... | \n", "CCC1=C\\[C@H](O[C@H]2C[C@@](C)(OC)[C@@H](O)[C@H... | \n", "corrected | \n", "修复原子 28 | \n", "SIME | \n", "6.289575 | \n", "
| 4 | \n", "O=C1C[C@@H](O)C[C@H](O[C@H]9C[C@@](C)(OC)[C@@H... | \n", "CCC1=C\\[C@H](O[C@H]2C[C@@](C)(OC)[C@@H](O)[C@H... | \n", "corrected | \n", "修复原子 28 | \n", "SIME | \n", "6.327510 | \n", "