1. 目录结构调整: - 创建scripts目录统一存放分析脚本 - 保持数据文件在原有目录结构中 - 生成的CSV文件和PNG图表文件也放在scripts目录下 2. 功能改进: - 更新calculate_qed_values.py脚本,添加对参考分子SDF文件的处理 - 修改analyze_qed_mw_distribution.py脚本,统一使用SDF文件stem名称作为参考分子标识符 - 改进Vina得分提取逻辑,支持从SDF文件中提取所有构象的得分 - 完善KDE分布图绘制,确保参考分子在所有图表中显示统一的名称 3. 文档更新: - 更新README.md中的目录结构说明 - 更新命令行和API使用示例 - 添加详细的使用说明和示例 4. 示例代码: - 更新example_api_usage.py以适应新的目录结构和API调用方式
目录结构
.
├── config/ # Configuration files (box definitions, etc.)
├── ligand/ # Ligand files
│ └── pdbqt/ # Prepared ligand files in PDBQT format
├── receptor/ # Receptor files
├── result/ # Docking results
│ ├── fgbar/ # FgBar dataset results
│ │ └── poses_all/ # Individual docking results in SDF format
│ ├── trpe/ # TrpE dataset results
│ │ └── poses_all/ # Individual docking results in SDF format
│ └── refence/ # Reference molecule files
│ ├── fgbar/ # FgBar reference molecules
│ └── trpe/ # TrpE reference molecules
├── scripts/ # Analysis scripts and utilities
└── README.md # This file
1. Preparation
Before running the pipeline, you need to prepare the following files:
- Protein structure file (PDB format)
- Ligand library (MOL2 format, named according to the format CNPxxxxxx.1.mol2)
- Configuration file (box.txt format, defining the docking box parameters)
2. Execution Steps
2.1 Protein Preparation
prepare_receptor4.py -r protein.pdb -o protein.pdbqt
2.2 Ligand Preparation
prepare_ligand4.py -l ligand.mol2 -o ligand.pdbqt
2.3 Docking Execution
vina --config box.txt --receptor protein.pdbqt --ligand ligand.pdbqt --out out.pdbqt
2.4 Result Format Conversion
Convert PDBQT format results to SDF format:
mk_export.py ./*_out.pdbqt --suffix _converted
3. Result Analysis
3.1 Calculate QED Properties
Calculate QED values for all molecules:
cd scripts
python calculate_qed_values.py
This script processes both the docked molecules in the poses_all directories and the reference molecules in the refence directories. It generates two CSV files:
- qed_values_fgbar.csv
- qed_values_trpe.csv
Each CSV file contains the following columns:
- smiles: SMILES representation of the molecule
- filename: Name of the source file
- qed: QED value of the molecule
- molecular_weight: Molecular weight of the molecule
- vina_scores: List of Vina scores for all conformers
3.2 Analyze QED and Molecular Weight Distribution
Analyze the distribution of QED and molecular weight properties and generate KDE plots:
python analyze_qed_mw_distribution.py qed_values_fgbar.csv qed_values_trpe.csv --dataset-names fgbar --dataset-names trpe
This will generate four plots:
- kde_distribution_fgbar_normalized.png - Normalized distribution for fgbar dataset
- kde_distribution_fgbar_actual.png - Actual values distribution for fgbar dataset
- kde_distribution_trpe_normalized.png - Normalized distribution for trpe dataset
- kde_distribution_trpe_actual.png - Actual values distribution for trpe dataset
Each plot contains three distributions:
- QED distribution (blue)
- Molecular weight distribution (red)
- Vina score distribution (green)
Reference molecules are marked with different colored markers and labeled with their identifiers and corresponding values.
3.3 Advanced Analysis Options
You can also specify custom reference scores and conformation rank:
python analyze_qed_mw_distribution.py qed_values_fgbar.csv qed_values_trpe.csv --dataset-names fgbar --dataset-names trpe --rank 0
The --rank option allows you to specify which conformation from the reference molecule's docking results to use for the Vina score reference.
- Rank 0 (default) uses the best scoring conformation (rank 1 in Vina results)
- Rank 1 uses the second best scoring conformation, and so on
The maximum valid rank is determined by the minimum number of conformations generated across all docked molecules. If you specify a rank that exceeds this minimum, the script will raise an error and inform you of the maximum valid rank.
You can also specify custom reference scores:
python analyze_qed_mw_distribution.py qed_values_fgbar.csv qed_values_trpe.csv --dataset-names fgbar --dataset-names trpe --reference-scores '{"fgbar": {"9NY": -5.268}, "trpe": {"0GA": -6.531}}'
4. API Usage
The analysis functions can also be called directly from Python:
import sys
sys.path.append('scripts')
from analyze_qed_mw_distribution import main_api
# Basic usage
main_api(['scripts/qed_values_fgbar.csv', 'scripts/qed_values_trpe.csv'], ['fgbar', 'trpe'])
# With custom reference scores
main_api(['scripts/qed_values_fgbar.csv', 'scripts/qed_values_trpe.csv'], ['fgbar', 'trpe'],
reference_scores={'fgbar': {'9NY': -5.268}, 'trpe': {'0GA': -6.531}})
# With specific conformation rank
main_api(['scripts/qed_values_fgbar.csv', 'scripts/qed_values_trpe.csv'], ['fgbar', 'trpe'], rank=0)
5. Output Files
The analysis generates several output files:
- CSV files with QED values and Vina scores for all molecules
- KDE distribution plots in both normalized and actual values formats
AutoDock Vina Pipeline
This repository contains a complete pipeline for molecular docking using AutoDock Vina, including preparation, execution, result processing, and analysis.
受体准备 pdbqt 文件
使用 alphafold 预测 pdb 文件 cif 文件。
修复使用 moderller 同源建模,或者 pdbfixer,MOE,maestro 等
这里使用 maestro 的 Protein reparation Workflow 模块
然后导出 pdb 文件
使用 meeko 准备受体文件 pdbqt 文件,详细可以参考
micromamba run -n vina mk_prepare_receptor.py -i receptor/FgBar1_cut_proteinprep.pdb --write_pdbqt receptor/FgBar1_cut_proteinprep.pdbqt
选项组合用法
举例1:用默认输出名生成 pdbqt 和 vina box 配置
mk_prepare_receptor.py -i 1abc.pdb -o 1abc_clean --write_pdbqt --write_vina_box
得到 1abc_clean_rigid.pdbqt, 1abc_clean.vina.txt
举例2:为指定残基设置模板/柔性,并生成 box 配置
mk_prepare_receptor.py -i system.pdb \
--output_basename system_prep \
-f "A:42,B:23" \
-n "A:5,7=CYX,B:17=HID" \
--write_pdbqt --write_vina_box
举例3:自动包络某配体生成 box 配置
mk_prepare_receptor.py -i prot.pdb \
--box_enveloping ligand.pdb \
--padding 3.0 \
--output_basename dock_ready \
--write_pdbqt --write_vina_box
小分子 3D 构象准备
需要给小分子一个初始化的 3d 构象存放到ligand/sdf
python sdf2to3d.py --src_dir ./2d_sdf_dir --out_dir ./3d_sdf_dir --n_jobs 8
小分子格式转化
使用 meeko 将 ligand/sdf 转为 ligand/pdbqt
micromamba run -n vina ./scripts/batch_prepare_ligands.sh ligands/sdf ligands/pdbqt/ batch_prepare_ligands.log 128
小分子批量提交对接
分割小分子文件将 ligand 目录里面的 pdbqt 文件夹拆分 n 个子文件夹(pdbqt1,pdbqt2,pdbqt3...pdbqtn)
micromamba run -n vina python vina_split_and_submit.py <split_number_n>
执行完成后会自动使用 dsub 命令将对接任务提交给华为多瑙调度系统
需要注意有时候提交执行速度过快可能有批次遗漏,可以在合并时候检查
对接结果合并
在对接完成之后会在 result 文件夹里面创建 n 个对接结果文件夹(poses1,poses2,poses3...posesn)
每个文件夹中都有对应的*_out.pdbqt文件与*_converted.sdf文件,调用
micromamba run -n vina python vina_merge_and_check.py --n_splits <split_number_n> --out_dir ./result --output_prefix poses --poses_dir ./result/poses_all
会将所有的n 个对接结果文件夹中*_converted.sdf文件存放到 ./result/poses_all 目录,同时会检测是否有提交时候过快导致遗漏某个批次没有对接,需要注意查看。
分析对接结果
在*_converted.sdf文件中存在20个对接构象,取决于scripts/batch_docking.sh 中 NUM_MODES 设置多少数目,默认设置为 20。
其中每个 sdf 构象存在下面的<meeko>字段 用于获取对接打分等属性用于后续筛选分子。
> <meeko> (20)
{"is_sidechain": [false], "free_energy": -6.38, "intermolecular_energy": -15.695, "internal_energy": -2.912}
batch 模式对接
vina=1.2.7可以使用batch 模式进行批量对接。
mkdir -p results/poses
vina --receptor input/receptors/TrpE_entry_1.pdbqt \
--batch input/ligands/test \
--config ./configs/TrpE_entry_1.box.txt \
--dir results/poses \
--exhaustiveness=32
# 使用脚本对接
./scripts/batch_docking.sh ./receptors/TrpE_entry_1.pdbqt ./config/TrpE_entry_1.box.txt ligands/test output test.log /share/home/lyzeng24/rdkit_script/vina/vina
环境安装
conda install -c conda-forge vina meeko rdkit joblib rich ipython parallel -y
准备小分子pdbqt
# 单个配体准备
mk_prepare_ligand.py -i molecule.sdf -o molecule.pdbqt
# 批量准备
micromamba run -n vina ./scripts/batch_prepare_ligands.sh ligands/sdf ligands/pdbqt/ batch_prepare_ligands.log 128
#监控文件
watch -n 1 "ls -l pdbqt/*.pdbqt 2>/dev/null | wc -l"
准备受体pdbqt
# 受体准备(带柔性侧链)
mk_prepare_receptor.py -i nucleic_acid.cif -o my_receptor -j -p -f A:42
batch对接模式
./scripts/batch_docking.sh input/receptors/TrpE_entry_1.pdbqt \
input/configs/TrpE_entry_1.box.txt \
input/ligands/pdbqt \
results/poses \
results/batch_docking.log
监控对接结果
watch -n 1 'for i in {1..12}; do printf "poses$i: "; ls results/poses$i/*.pdbqt 2>/dev/null | wc -l; done'
将对接结果还原为sdf文件
mk_export.py 命令行工具的各个参数选项。
cd output
mk_export.py ./*_out.pdbqt --suffix _converted
分析vina对接结果
# 结果导出
mk_export.py vina_results.pdbqt -j my_receptor.json -s lig_docked.sdf -p rec_docked.pdb
djob 运行时间耗时长的批次任务
24562323 vina_job15 RUNNING lyzeng24 default default 2025/07/31 23:16:30 - agent-ARM-17
24562322 vina_job14 RUNNING lyzeng24 default default 2025/07/31 23:16:30 - agent-ARM-17
24562321 vina_job13 RUNNING lyzeng24 default default 2025/07/31 23:16:30 - agent-ARM-17
24562320 vina_job12 RUNNING lyzeng24 default default 2025/07/31 23:16:29 - agent-ARM-21
24562319 vina_job11 RUNNING lyzeng24 default default 2025/07/31 23:16:29 - agent-ARM-21
24562318 vina_job10 RUNNING lyzeng24 default default 2025/07/31 23:16:29 - agent-ARM-21
24562317 vina_job9 RUNNING lyzeng24 default default 2025/07/31 23:16:28 - agent-ARM-21
24562316 vina_job8 RUNNING lyzeng24 default default 2025/07/31 23:16:28 - agent-ARM-16
24562315 vina_job7 RUNNING lyzeng24 default default 2025/07/31 23:16:28 - agent-ARM-16
24562314 vina_job6 RUNNING lyzeng24 default default 2025/07/31 23:16:27 - agent-ARM-16
24562313 vina_job5 RUNNING lyzeng24 default default 2025/07/31 23:16:27 - agent-ARM-19
24562312 vina_job4 RUNNING lyzeng24 default default 2025/07/31 23:16:27 - agent-ARM-19
24562311 vina_job3 RUNNING lyzeng24 default default 2025/07/31 23:16:27 - agent-ARM-19
autodock vina 参考分子对接
trpe:(PDB ID: 5cwa)
./vina --receptor ./refence/trpe/TrpE_entry_1.pdbqt --ligand ./refence/trpe/align_5cwa_0GA_addH.pdbqt --config ./refence/trpe/TrpE_entry_1.box.txt --out ./refence/trpe/align_5cwa_0GA_addH_out.pdbqt --exhaustiveness="32" --num_modes="20" --energy_range="5.0"
result:
AutoDock Vina v1.2.7
#################################################################
# If you used AutoDock Vina in your work, please cite: #
# #
# J. Eberhardt, D. Santos-Martins, A. F. Tillack, and S. Forli #
# AutoDock Vina 1.2.0: New Docking Methods, Expanded Force #
# Field, and Python Bindings, J. Chem. Inf. Model. (2021) #
# DOI 10.1021/acs.jcim.1c00203 #
# #
# O. Trott, A. J. Olson, #
# AutoDock Vina: improving the speed and accuracy of docking #
# with a new scoring function, efficient optimization and #
# multithreading, J. Comp. Chem. (2010) #
# DOI 10.1002/jcc.21334 #
# #
# Please see https://github.com/ccsb-scripps/AutoDock-Vina for #
# more information. #
#################################################################
Scoring function : vina
Rigid receptor: ./refence/trpe/TrpE_entry_1.pdbqt
Ligand: ./refence/trpe/align_5cwa_0GA_addH.pdbqt
Grid center: X 7.402 Y -4.783 Z -11.818
Grid size : X 30 Y 30 Z 30
Grid space : 0.375
Exhaustiveness: 32
CPU: 0
Verbosity: 1
Computing Vina grid ... done.
WARNING: At low exhaustiveness, it may be impossible to utilize all CPUs.
Performing docking (random seed: 650309048) ...
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
mode | affinity | dist from best mode
| (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
1 -6.531 0 0
2 -6.352 3.988 6.453
3 -6.3 1.447 5.602
4 -6.291 1.94 5.284
5 -6.283 1.044 2.037
6 -6.159 3.798 5.275
7 -6.124 1.43 5.553
8 -5.988 3.499 5.489
9 -5.925 3.311 4.252
10 -5.912 3.647 4.894
11 -5.889 7.256 10.49
12 -5.821 2.351 5.29
13 -5.763 3.731 6.18
14 -5.732 3.557 6.002
15 -5.729 7.213 9.251
16 -5.693 4.179 5.642
17 -5.684 3.058 4.111
18 -5.679 4.117 5.518
19 -5.671 4.656 6.098
20 -5.663 4.112 5.705
fgbar:(PDB ID: 8izd)
./vina --receptor ./refence/fgbar/FgBar1_cut_proteinprep.pdbqt --ligand ./refence/fgbar/align_8izd_F_9NY_addH.pdbqt --config ./refence/fgbar/FgBar1_entry_1.box.txt --out ./refence/fgbar/align_8izd_F_9NY_addH_out.pdbqt --exhaustiveness="32" --num_modes="20" --energy_range="5.0"
reusult:
AutoDock Vina v1.2.7
#################################################################
# If you used AutoDock Vina in your work, please cite: #
# #
# J. Eberhardt, D. Santos-Martins, A. F. Tillack, and S. Forli #
# AutoDock Vina 1.2.0: New Docking Methods, Expanded Force #
# Field, and Python Bindings, J. Chem. Inf. Model. (2021) #
# DOI 10.1021/acs.jcim.1c00203 #
# #
# O. Trott, A. J. Olson, #
# AutoDock Vina: improving the speed and accuracy of docking #
# with a new scoring function, efficient optimization and #
# multithreading, J. Comp. Chem. (2010) #
# DOI 10.1002/jcc.21334 #
# #
# Please see https://github.com/ccsb-scripps/AutoDock-Vina for #
# more information. #
#################################################################
Scoring function : vina
Rigid receptor: ./refence/fgbar/FgBar1_cut_proteinprep.pdbqt
Ligand: ./refence/fgbar/align_8izd_F_9NY_addH.pdbqt
Grid center: X -12.7 Y -9.1 Z -0.3
Grid size : X 49.1 Y 37.6 Z 35.2
Grid space : 0.375
Exhaustiveness: 32
CPU: 0
Verbosity: 1
WARNING: Search space volume is greater than 27000 Angstrom^3 (See FAQ)
Computing Vina grid ... done.
WARNING: At low exhaustiveness, it may be impossible to utilize all CPUs.
Performing docking (random seed: -399012800) ...
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
mode | affinity | dist from best mode
| (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
1 -5.268 0 0
2 -5.106 3.453 7.96
3 -5.003 3.114 6.709
4 -4.986 6.86 13.92
5 -4.947 5.434 13
6 -4.875 4.933 10.47
7 -4.867 6.888 13.75
8 -4.862 4.244 9.114
9 -4.835 3.776 6.806
10 -4.826 3.682 7.143
11 -4.824 5.4 10.17
12 -4.81 5.364 7.809
13 -4.808 4.364 11.15
14 -4.805 3.211 5.684
15 -4.783 3.585 8.995
16 -4.773 6.47 13.64
17 -4.773 3.465 6.652
18 -4.731 4.73 9.619
19 -4.726 4.867 10.88
20 -4.716 4.834 8.903
对接结果并不理想,可能是分子中灵活的扭转角多,柔性较大。AutoDock Vina 更偏向刚性对接。
分析策略
trpe
AutoDock vina:QED 针对小空间(分子量小的)trpe,QED 过滤。()
karamadock:只看 qed 过滤后的小分子对接情况(过滤标准:小分子,QED)
glide: 小分子,QED。(vina 打分好的 1w 个 , 按照底物标准)
fgbar
vina,karamadock,底物标准,选择 交集做 glide。