2025-08-05 20:37:33 +08:00
2025-08-02 22:13:12 +08:00
2025-08-02 22:13:12 +08:00
2025-08-05 20:37:33 +08:00
2025-08-02 21:54:31 +08:00
2025-08-02 21:54:31 +08:00
2025-08-02 21:54:31 +08:00
2025-08-02 21:54:31 +08:00
2025-08-02 21:54:31 +08:00
2025-08-02 21:54:31 +08:00
2025-08-02 21:54:31 +08:00
2025-08-02 21:54:31 +08:00
2025-08-02 21:54:31 +08:00

目录结构

.
├── config/                    # Configuration files (box definitions, etc.)
├── ligand/                    # Ligand files
│   └── pdbqt/                 # Prepared ligand files in PDBQT format
├── receptor/                  # Receptor files
├── result/                    # Docking results
│   ├── fgbar/                 # FgBar dataset results
│   │   └── poses_all/         # Individual docking results in SDF format
│   ├── trpe/                  # TrpE dataset results
│   │   └── poses_all/         # Individual docking results in SDF format
│   └── refence/               # Reference molecule files
│       ├── fgbar/             # FgBar reference molecules
│       └── trpe/              # TrpE reference molecules
├── scripts/                   # Analysis scripts and utilities
└── README.md                  # This file

1. Preparation

Before running the pipeline, you need to prepare the following files:

  1. Protein structure file (PDB format)
  2. Ligand library (MOL2 format, named according to the format CNPxxxxxx.1.mol2)
  3. Configuration file (box.txt format, defining the docking box parameters)

2. Execution Steps

2.1 Protein Preparation

prepare_receptor4.py -r protein.pdb -o protein.pdbqt

2.2 Ligand Preparation

prepare_ligand4.py -l ligand.mol2 -o ligand.pdbqt

2.3 Docking Execution

vina --config box.txt --receptor protein.pdbqt --ligand ligand.pdbqt --out out.pdbqt

2.4 Result Format Conversion

Convert PDBQT format results to SDF format:

mk_export.py ./*_out.pdbqt --suffix _converted

3. Result Analysis

3.1 Calculate QED Properties

Calculate QED values for all molecules:

cd scripts
python calculate_qed_values.py

This script processes both the docked molecules in the poses_all directories and the reference molecules in the refence directories. It generates two CSV files:

  • qed_values_fgbar.csv
  • qed_values_trpe.csv

Each CSV file contains the following columns:

  • smiles: SMILES representation of the molecule
  • filename: Name of the source file
  • qed: QED value of the molecule
  • molecular_weight: Molecular weight of the molecule
  • vina_scores: List of Vina scores for all conformers

3.2 Analyze QED and Molecular Weight Distribution

Analyze the distribution of QED and molecular weight properties and generate KDE plots:

python analyze_qed_mw_distribution.py qed_values_fgbar.csv qed_values_trpe.csv --dataset-names fgbar --dataset-names trpe

This will generate four plots:

  1. kde_distribution_fgbar_normalized.png - Normalized distribution for fgbar dataset
  2. kde_distribution_fgbar_actual.png - Actual values distribution for fgbar dataset
  3. kde_distribution_trpe_normalized.png - Normalized distribution for trpe dataset
  4. kde_distribution_trpe_actual.png - Actual values distribution for trpe dataset

Each plot contains three distributions:

  • QED distribution (blue)
  • Molecular weight distribution (red)
  • Vina score distribution (green)

Reference molecules are marked with different colored markers and labeled with their identifiers and corresponding values.

3.3 Advanced Analysis Options

You can also specify custom reference scores and conformation rank:

python analyze_qed_mw_distribution.py qed_values_fgbar.csv qed_values_trpe.csv --dataset-names fgbar --dataset-names trpe --rank 0

The --rank option allows you to specify which conformation from the reference molecule's docking results to use for the Vina score reference.

  • Rank 0 (default) uses the best scoring conformation (rank 1 in Vina results)
  • Rank 1 uses the second best scoring conformation, and so on

The maximum valid rank is determined by the minimum number of conformations generated across all docked molecules. If you specify a rank that exceeds this minimum, the script will raise an error and inform you of the maximum valid rank.

You can also specify custom reference scores:

python analyze_qed_mw_distribution.py qed_values_fgbar.csv qed_values_trpe.csv --dataset-names fgbar --dataset-names trpe --reference-scores '{"fgbar": {"9NY": -5.268}, "trpe": {"0GA": -6.531}}'

4. API Usage

The analysis functions can also be called directly from Python:

import sys
sys.path.append('scripts')
from analyze_qed_mw_distribution import main_api

# Basic usage
main_api(['scripts/qed_values_fgbar.csv', 'scripts/qed_values_trpe.csv'], ['fgbar', 'trpe'])

# With custom reference scores
main_api(['scripts/qed_values_fgbar.csv', 'scripts/qed_values_trpe.csv'], ['fgbar', 'trpe'], 
         reference_scores={'fgbar': {'9NY': -5.268}, 'trpe': {'0GA': -6.531}})

# With specific conformation rank
main_api(['scripts/qed_values_fgbar.csv', 'scripts/qed_values_trpe.csv'], ['fgbar', 'trpe'], rank=0)

5. Output Files

The analysis generates several output files:

  • CSV files with QED values and Vina scores for all molecules
  • KDE distribution plots in both normalized and actual values formats

AutoDock Vina Pipeline

This repository contains a complete pipeline for molecular docking using AutoDock Vina, including preparation, execution, result processing, and analysis.

受体准备 pdbqt 文件

使用 alphafold 预测 pdb 文件 cif 文件。

修复使用 moderller 同源建模,或者 pdbfixerMOEmaestro 等

这里使用 maestro 的 Protein reparation Workflow 模块

然后导出 pdb 文件

使用 meeko 准备受体文件 pdbqt 文件,详细可以参考

micromamba run -n vina mk_prepare_receptor.py -i receptor/FgBar1_cut_proteinprep.pdb --write_pdbqt receptor/FgBar1_cut_proteinprep.pdbqt

选项组合用法

举例1用默认输出名生成 pdbqt 和 vina box 配置

mk_prepare_receptor.py -i 1abc.pdb -o 1abc_clean --write_pdbqt --write_vina_box

得到 1abc_clean_rigid.pdbqt, 1abc_clean.vina.txt

举例2为指定残基设置模板/柔性,并生成 box 配置

mk_prepare_receptor.py -i system.pdb \
  --output_basename system_prep \
  -f "A:42,B:23" \
  -n "A:5,7=CYX,B:17=HID" \
  --write_pdbqt --write_vina_box

举例3自动包络某配体生成 box 配置

mk_prepare_receptor.py -i prot.pdb \
  --box_enveloping ligand.pdb \
  --padding 3.0 \
  --output_basename dock_ready \
  --write_pdbqt --write_vina_box

小分子 3D 构象准备

需要给小分子一个初始化的 3d 构象存放到ligand/sdf

python sdf2to3d.py --src_dir ./2d_sdf_dir --out_dir ./3d_sdf_dir --n_jobs 8

小分子格式转化

使用 meeko 将 ligand/sdf 转为 ligand/pdbqt

micromamba run -n vina ./scripts/batch_prepare_ligands.sh ligands/sdf ligands/pdbqt/ batch_prepare_ligands.log 128

小分子批量提交对接

分割小分子文件将 ligand 目录里面的 pdbqt 文件夹拆分 n 个子文件夹(pdbqt1,pdbqt2,pdbqt3...pdbqtn)

micromamba run -n vina python vina_split_and_submit.py <split_number_n>

执行完成后会自动使用 dsub 命令将对接任务提交给华为多瑙调度系统

需要注意有时候提交执行速度过快可能有批次遗漏,可以在合并时候检查

对接结果合并

在对接完成之后会在 result 文件夹里面创建 n 个对接结果文件夹poses1poses2poses3...posesn

每个文件夹中都有对应的*_out.pdbqt文件与*_converted.sdf文件,调用

micromamba run -n vina python vina_merge_and_check.py --n_splits <split_number_n> --out_dir ./result --output_prefix poses --poses_dir ./result/poses_all

会将所有的n 个对接结果文件夹中*_converted.sdf文件存放到 ./result/poses_all 目录,同时会检测是否有提交时候过快导致遗漏某个批次没有对接,需要注意查看。

分析对接结果

*_converted.sdf文件中存在20个对接构象,取决于scripts/batch_docking.shNUM_MODES 设置多少数目,默认设置为 20。

其中每个 sdf 构象存在下面的<meeko>字段 用于获取对接打分等属性用于后续筛选分子。

>  <meeko>  (20) 
{"is_sidechain": [false], "free_energy": -6.38, "intermolecular_energy": -15.695, "internal_energy": -2.912}

batch 模式对接

vina=1.2.7可以使用batch 模式进行批量对接。

mkdir -p results/poses
vina --receptor input/receptors/TrpE_entry_1.pdbqt \  
     --batch input/ligands/test \  
     --config ./configs/TrpE_entry_1.box.txt \  
     --dir results/poses \  
     --exhaustiveness=32

# 使用脚本对接
./scripts/batch_docking.sh ./receptors/TrpE_entry_1.pdbqt ./config/TrpE_entry_1.box.txt ligands/test output test.log /share/home/lyzeng24/rdkit_script/vina/vina

环境安装

conda install -c conda-forge vina meeko rdkit joblib rich ipython parallel -y

准备小分子pdbqt

# 单个配体准备  
mk_prepare_ligand.py -i molecule.sdf -o molecule.pdbqt  
  
# 批量准备
micromamba run -n vina ./scripts/batch_prepare_ligands.sh ligands/sdf ligands/pdbqt/ batch_prepare_ligands.log 128

#监控文件
watch -n 1 "ls -l pdbqt/*.pdbqt 2>/dev/null | wc -l"

准备受体pdbqt

# 受体准备(带柔性侧链)
mk_prepare_receptor.py -i nucleic_acid.cif -o my_receptor -j -p -f A:42

batch对接模式

./scripts/batch_docking.sh input/receptors/TrpE_entry_1.pdbqt \  
                          input/configs/TrpE_entry_1.box.txt \  
                          input/ligands/pdbqt \  
                          results/poses \  
                          results/batch_docking.log

监控对接结果

watch -n 1 'for i in {1..12}; do printf "poses$i: "; ls results/poses$i/*.pdbqt 2>/dev/null | wc -l; done'

将对接结果还原为sdf文件

mk_export.py 命令行工具的各个参数选项。

cd output
mk_export.py ./*_out.pdbqt --suffix _converted

分析vina对接结果

# 结果导出
mk_export.py vina_results.pdbqt -j my_receptor.json -s lig_docked.sdf -p rec_docked.pdb

djob 运行时间耗时长的批次任务

24562323     vina_job15   RUNNING    lyzeng24     default      default      2025/07/31 23:16:30  -                    agent-ARM-17         
24562322     vina_job14   RUNNING    lyzeng24     default      default      2025/07/31 23:16:30  -                    agent-ARM-17         
24562321     vina_job13   RUNNING    lyzeng24     default      default      2025/07/31 23:16:30  -                    agent-ARM-17         
24562320     vina_job12   RUNNING    lyzeng24     default      default      2025/07/31 23:16:29  -                    agent-ARM-21         
24562319     vina_job11   RUNNING    lyzeng24     default      default      2025/07/31 23:16:29  -                    agent-ARM-21         
24562318     vina_job10   RUNNING    lyzeng24     default      default      2025/07/31 23:16:29  -                    agent-ARM-21         
24562317     vina_job9    RUNNING    lyzeng24     default      default      2025/07/31 23:16:28  -                    agent-ARM-21         
24562316     vina_job8    RUNNING    lyzeng24     default      default      2025/07/31 23:16:28  -                    agent-ARM-16         
24562315     vina_job7    RUNNING    lyzeng24     default      default      2025/07/31 23:16:28  -                    agent-ARM-16         
24562314     vina_job6    RUNNING    lyzeng24     default      default      2025/07/31 23:16:27  -                    agent-ARM-16         
24562313     vina_job5    RUNNING    lyzeng24     default      default      2025/07/31 23:16:27  -                    agent-ARM-19         
24562312     vina_job4    RUNNING    lyzeng24     default      default      2025/07/31 23:16:27  -                    agent-ARM-19         
24562311     vina_job3    RUNNING    lyzeng24     default      default      2025/07/31 23:16:27  -                    agent-ARM-19

autodock vina 参考分子对接

trpe:(PDB ID: 5cwa)

./vina --receptor ./refence/trpe/TrpE_entry_1.pdbqt --ligand ./refence/trpe/align_5cwa_0GA_addH.pdbqt --config ./refence/trpe/TrpE_entry_1.box.txt --out ./refence/trpe/align_5cwa_0GA_addH_out.pdbqt --exhaustiveness="32" --num_modes="20"  --energy_range="5.0"

result:

AutoDock Vina v1.2.7
#################################################################
# If you used AutoDock Vina in your work, please cite:          #
#                                                               #
# J. Eberhardt, D. Santos-Martins, A. F. Tillack, and S. Forli  #
# AutoDock Vina 1.2.0: New Docking Methods, Expanded Force      #
# Field, and Python Bindings, J. Chem. Inf. Model. (2021)       #
# DOI 10.1021/acs.jcim.1c00203                                  #
#                                                               #
# O. Trott, A. J. Olson,                                        #
# AutoDock Vina: improving the speed and accuracy of docking    #
# with a new scoring function, efficient optimization and       #
# multithreading, J. Comp. Chem. (2010)                         #
# DOI 10.1002/jcc.21334                                         #
#                                                               #
# Please see https://github.com/ccsb-scripps/AutoDock-Vina for  #
# more information.                                             #
#################################################################

Scoring function : vina
Rigid receptor: ./refence/trpe/TrpE_entry_1.pdbqt
Ligand: ./refence/trpe/align_5cwa_0GA_addH.pdbqt
Grid center: X 7.402 Y -4.783 Z -11.818
Grid size  : X 30 Y 30 Z 30
Grid space : 0.375
Exhaustiveness: 32
CPU: 0
Verbosity: 1

Computing Vina grid ... done.
WARNING: At low exhaustiveness, it may be impossible to utilize all CPUs.
Performing docking (random seed: 650309048) ... 
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -6.531          0          0
   2       -6.352      3.988      6.453
   3         -6.3      1.447      5.602
   4       -6.291       1.94      5.284
   5       -6.283      1.044      2.037
   6       -6.159      3.798      5.275
   7       -6.124       1.43      5.553
   8       -5.988      3.499      5.489
   9       -5.925      3.311      4.252
  10       -5.912      3.647      4.894
  11       -5.889      7.256      10.49
  12       -5.821      2.351       5.29
  13       -5.763      3.731       6.18
  14       -5.732      3.557      6.002
  15       -5.729      7.213      9.251
  16       -5.693      4.179      5.642
  17       -5.684      3.058      4.111
  18       -5.679      4.117      5.518
  19       -5.671      4.656      6.098
  20       -5.663      4.112      5.705

fgbar:PDB ID 8izd

./vina --receptor ./refence/fgbar/FgBar1_cut_proteinprep.pdbqt --ligand ./refence/fgbar/align_8izd_F_9NY_addH.pdbqt --config ./refence/fgbar/FgBar1_entry_1.box.txt --out ./refence/fgbar/align_8izd_F_9NY_addH_out.pdbqt --exhaustiveness="32" --num_modes="20"  --energy_range="5.0"

reusult:

AutoDock Vina v1.2.7
#################################################################
# If you used AutoDock Vina in your work, please cite:          #
#                                                               #
# J. Eberhardt, D. Santos-Martins, A. F. Tillack, and S. Forli  #
# AutoDock Vina 1.2.0: New Docking Methods, Expanded Force      #
# Field, and Python Bindings, J. Chem. Inf. Model. (2021)       #
# DOI 10.1021/acs.jcim.1c00203                                  #
#                                                               #
# O. Trott, A. J. Olson,                                        #
# AutoDock Vina: improving the speed and accuracy of docking    #
# with a new scoring function, efficient optimization and       #
# multithreading, J. Comp. Chem. (2010)                         #
# DOI 10.1002/jcc.21334                                         #
#                                                               #
# Please see https://github.com/ccsb-scripps/AutoDock-Vina for  #
# more information.                                             #
#################################################################

Scoring function : vina
Rigid receptor: ./refence/fgbar/FgBar1_cut_proteinprep.pdbqt
Ligand: ./refence/fgbar/align_8izd_F_9NY_addH.pdbqt
Grid center: X -12.7 Y -9.1 Z -0.3
Grid size  : X 49.1 Y 37.6 Z 35.2
Grid space : 0.375
Exhaustiveness: 32
CPU: 0
Verbosity: 1

WARNING: Search space volume is greater than 27000 Angstrom^3 (See FAQ)
Computing Vina grid ... done.
WARNING: At low exhaustiveness, it may be impossible to utilize all CPUs.
Performing docking (random seed: -399012800) ... 
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -5.268          0          0
   2       -5.106      3.453       7.96
   3       -5.003      3.114      6.709
   4       -4.986       6.86      13.92
   5       -4.947      5.434         13
   6       -4.875      4.933      10.47
   7       -4.867      6.888      13.75
   8       -4.862      4.244      9.114
   9       -4.835      3.776      6.806
  10       -4.826      3.682      7.143
  11       -4.824        5.4      10.17
  12        -4.81      5.364      7.809
  13       -4.808      4.364      11.15
  14       -4.805      3.211      5.684
  15       -4.783      3.585      8.995
  16       -4.773       6.47      13.64
  17       -4.773      3.465      6.652
  18       -4.731       4.73      9.619
  19       -4.726      4.867      10.88
  20       -4.716      4.834      8.903

对接结果并不理想可能是分子中灵活的扭转角多柔性较大。AutoDock Vina 更偏向刚性对接。

分析策略

trpe

AutoDock vinaQED 针对小空间分子量小的trpeQED 过滤。()

karamadock只看 qed 过滤后的小分子对接情况(过滤标准:小分子QED

glide: 小分子QED。vina 打分好的 1w 个 按照底物标准)


fgbar

vinakaramadock底物标准选择 交集做 glide。

Description
面向自动化批量对接的 autodock vina 批处理脚本,易于集群或本地大规模筛选。
Readme 16 MiB
Languages
Python 86.1%
Shell 8.8%
Jupyter Notebook 5.1%