224 lines
7.6 KiB
Markdown
224 lines
7.6 KiB
Markdown
## 目录结构
|
||
|
||
```shell
|
||
project_root/
|
||
├── input/
|
||
│ ├── receptors/
|
||
│ │ ├── TrpE_entry_1.pdb
|
||
│ │ └── TrpE_entry_1.pdbqt
|
||
│ ├── ligands/
|
||
│ │ ├── sdf/
|
||
│ │ │ ├── ligand_001.sdf
|
||
│ │ │ ├── ligand_002.sdf
|
||
│ │ │ └── ...
|
||
│ │ └── pdbqt/
|
||
│ │ ├── ligand_001.pdbqt
|
||
│ │ ├── ligand_002.pdbqt
|
||
│ │ └── ...
|
||
│ └── configs/
|
||
│ ├── TrpE_entry_1.box.txt
|
||
│ └── TrpE_entry_1.box.pdb
|
||
├── results/
|
||
│ ├── poses/
|
||
│ │ ├── ligand_001_out.pdbqt
|
||
│ │ ├── ligand_002_out.pdbqt
|
||
│ │ └── ...
|
||
│ └── scores/
|
||
│ ├── docking_scores.csv
|
||
│ └── summary_report.txt
|
||
└── scripts/
|
||
├── batch_prepare_ligands.sh
|
||
├── batch_docking.sh
|
||
└── analyze_results.py
|
||
```
|
||
|
||
## 受体准备 pdbqt 文件
|
||
|
||
使用 alphafold 预测 pdb 文件 cif 文件。
|
||
|
||
修复使用 moderller 同源建模,或者 pdbfixer,MOE,maestro 等
|
||
|
||
这里使用 maestro 的 `Protein reparation Workflow` 模块
|
||
|
||
然后导出 pdb 文件
|
||
|
||
使用 meeko 准备受体文件 pdbqt 文件,详细可以[参考](https://meeko.readthedocs.io/en/release-doc/rec_overview.html)
|
||
|
||
```shell
|
||
micromamba run -n vina mk_prepare_receptor.py -i receptor/FgBar1_cut_proteinprep.pdb --write_pdbqt receptor/FgBar1_cut_proteinprep.pdbqt
|
||
```
|
||
|
||
选项组合用法
|
||
|
||
### 举例1:用默认输出名生成 pdbqt 和 vina box 配置
|
||
|
||
mk_prepare_receptor.py -i 1abc.pdb -o 1abc_clean --write_pdbqt --write_vina_box
|
||
|
||
得到 1abc_clean_rigid.pdbqt, 1abc_clean.vina.txt
|
||
|
||
### 举例2:为指定残基设置模板/柔性,并生成 box 配置
|
||
|
||
```shell
|
||
mk_prepare_receptor.py -i system.pdb \
|
||
--output_basename system_prep \
|
||
-f "A:42,B:23" \
|
||
-n "A:5,7=CYX,B:17=HID" \
|
||
--write_pdbqt --write_vina_box
|
||
```
|
||
|
||
|
||
### 举例3:自动包络某配体生成 box 配置
|
||
|
||
```
|
||
mk_prepare_receptor.py -i prot.pdb \
|
||
--box_enveloping ligand.pdb \
|
||
--padding 3.0 \
|
||
--output_basename dock_ready \
|
||
--write_pdbqt --write_vina_box
|
||
```
|
||
|
||
|
||
## 小分子 3D 构象准备
|
||
|
||
需要给小分子一个初始化的 3d 构象存放到`ligand/sdf`
|
||
|
||
```shell
|
||
python sdf2to3d.py --src_dir ./2d_sdf_dir --out_dir ./3d_sdf_dir --n_jobs 8
|
||
```
|
||
|
||
## 小分子格式转化
|
||
|
||
使用 meeko 将 `ligand/sdf` 转为 `ligand/pdbqt`
|
||
|
||
```shell
|
||
micromamba run -n vina ./scripts/batch_prepare_ligands.sh ligands/sdf ligands/pdbqt/ batch_prepare_ligands.log 128
|
||
```
|
||
|
||
## 小分子批量提交对接
|
||
|
||
分割小分子文件将 ligand 目录里面的 pdbqt 文件夹拆分 n 个子文件夹(pdbqt1,pdbqt2,pdbqt3...pdbqtn)
|
||
|
||
```shell
|
||
micromamba run -n vina python vina_split_and_submit.py <split_number_n>
|
||
```
|
||
|
||
执行完成后会自动使用 dsub 命令将对接任务提交给华为多瑙调度系统
|
||
|
||
需要注意有时候提交执行速度过快可能有批次遗漏,可以在合并时候检查
|
||
|
||
## 对接结果合并
|
||
|
||
在对接完成之后会在 `result` 文件夹里面创建 n 个对接结果文件夹(poses1,poses2,poses3...posesn)
|
||
|
||
每个文件夹中都有对应的`*_out.pdbqt`文件与`*_converted.sdf`文件,调用
|
||
|
||
```shell
|
||
micromamba run -n vina python vina_merge_and_check.py --n_splits <split_number_n> --out_dir ./result --output_prefix poses --poses_dir ./result/poses_all
|
||
```
|
||
|
||
会将所有的n 个对接结果文件夹中`*_converted.sdf`文件存放到 `./result/poses_all` 目录,同时会检测是否有提交时候过快导致遗漏某个批次没有对接,需要注意查看。
|
||
|
||
## 分析对接结果
|
||
|
||
在`*_converted.sdf`文件中存在`20`个对接构象,取决于`scripts/batch_docking.sh` 中 `NUM_MODES` 设置多少数目,默认设置为 20。
|
||
|
||
其中每个 sdf 构象存在下面的`<meeko>`字段 用于获取对接打分等属性用于后续筛选分子。
|
||
|
||
```
|
||
> <meeko> (20)
|
||
{"is_sidechain": [false], "free_energy": -6.38, "intermolecular_energy": -15.695, "internal_energy": -2.912}
|
||
```
|
||
|
||
## batch 模式对接
|
||
|
||
vina=1.2.7可以使用batch 模式进行批量对接。
|
||
|
||
```shell
|
||
mkdir -p results/poses
|
||
vina --receptor input/receptors/TrpE_entry_1.pdbqt \
|
||
--batch input/ligands/test \
|
||
--config ./configs/TrpE_entry_1.box.txt \
|
||
--dir results/poses \
|
||
--exhaustiveness=32
|
||
|
||
# 使用脚本对接
|
||
./scripts/batch_docking.sh ./receptors/TrpE_entry_1.pdbqt ./config/TrpE_entry_1.box.txt ligands/test output test.log /share/home/lyzeng24/rdkit_script/vina/vina
|
||
```
|
||
|
||
## 环境安装
|
||
|
||
```shell
|
||
conda install -c conda-forge vina meeko rdkit joblib rich ipython parallel -y
|
||
```
|
||
|
||
|
||
## 准备小分子pdbqt
|
||
|
||
```shell
|
||
# 单个配体准备
|
||
mk_prepare_ligand.py -i molecule.sdf -o molecule.pdbqt
|
||
|
||
# 批量准备
|
||
micromamba run -n vina ./scripts/batch_prepare_ligands.sh ligands/sdf ligands/pdbqt/ batch_prepare_ligands.log 128
|
||
|
||
#监控文件
|
||
watch -n 1 "ls -l pdbqt/*.pdbqt 2>/dev/null | wc -l"
|
||
```
|
||
|
||
## 准备受体pdbqt
|
||
|
||
```shell
|
||
# 受体准备(带柔性侧链)
|
||
mk_prepare_receptor.py -i nucleic_acid.cif -o my_receptor -j -p -f A:42
|
||
```
|
||
|
||
## batch对接模式
|
||
|
||
```shell
|
||
./scripts/batch_docking.sh input/receptors/TrpE_entry_1.pdbqt \
|
||
input/configs/TrpE_entry_1.box.txt \
|
||
input/ligands/pdbqt \
|
||
results/poses \
|
||
results/batch_docking.log
|
||
```
|
||
|
||
## 监控对接结果
|
||
|
||
```shell
|
||
watch -n 1 'for i in {1..12}; do printf "poses$i: "; ls results/poses$i/*.pdbqt 2>/dev/null | wc -l; done'
|
||
```
|
||
|
||
## 将对接结果还原为sdf文件
|
||
|
||
mk_export.py 命令行工具的各个参数选项。
|
||
|
||
```shell
|
||
cd output
|
||
mk_export.py ./*_out.pdbqt --suffix _converted
|
||
```
|
||
|
||
## 分析vina对接结果
|
||
|
||
```shell
|
||
# 结果导出
|
||
mk_export.py vina_results.pdbqt -j my_receptor.json -s lig_docked.sdf -p rec_docked.pdb
|
||
```
|
||
|
||
## djob 运行时间耗时长的批次任务
|
||
|
||
```shell
|
||
24562323 vina_job15 RUNNING lyzeng24 default default 2025/07/31 23:16:30 - agent-ARM-17
|
||
24562322 vina_job14 RUNNING lyzeng24 default default 2025/07/31 23:16:30 - agent-ARM-17
|
||
24562321 vina_job13 RUNNING lyzeng24 default default 2025/07/31 23:16:30 - agent-ARM-17
|
||
24562320 vina_job12 RUNNING lyzeng24 default default 2025/07/31 23:16:29 - agent-ARM-21
|
||
24562319 vina_job11 RUNNING lyzeng24 default default 2025/07/31 23:16:29 - agent-ARM-21
|
||
24562318 vina_job10 RUNNING lyzeng24 default default 2025/07/31 23:16:29 - agent-ARM-21
|
||
24562317 vina_job9 RUNNING lyzeng24 default default 2025/07/31 23:16:28 - agent-ARM-21
|
||
24562316 vina_job8 RUNNING lyzeng24 default default 2025/07/31 23:16:28 - agent-ARM-16
|
||
24562315 vina_job7 RUNNING lyzeng24 default default 2025/07/31 23:16:28 - agent-ARM-16
|
||
24562314 vina_job6 RUNNING lyzeng24 default default 2025/07/31 23:16:27 - agent-ARM-16
|
||
24562313 vina_job5 RUNNING lyzeng24 default default 2025/07/31 23:16:27 - agent-ARM-19
|
||
24562312 vina_job4 RUNNING lyzeng24 default default 2025/07/31 23:16:27 - agent-ARM-19
|
||
24562311 vina_job3 RUNNING lyzeng24 default default 2025/07/31 23:16:27 - agent-ARM-19
|
||
```
|