113 lines
5.0 KiB
Markdown
113 lines
5.0 KiB
Markdown
# qsar
|
||
|
||
|
||
|
||
## Getting started
|
||
|
||
[chembl](https://www.ebi.ac.uk/chembl/)
|
||
|
||
DOI https://doi.org/10.1016/j.ejmech.2022.114495
|
||
|
||
Design, synthesis and activity against drug-resistant bacteria evaluation of C-20, C-23 modified 5-O-mycaminosyltylonolide derivatives
|
||
|
||
A类似物:22个活性数据
|
||
B类似物:7个活性数据
|
||
C类似物:47个活性数据
|
||
|
||
检索条件:Structure2D_A1.mol 85% 以上相似度
|
||
|
||
检索结果:A_85.csv
|
||
|
||
|
||
## env
|
||
|
||
[unimol install](https://unimol.readthedocs.io/en/latest/installation.html#install)
|
||
|
||
```shell
|
||
python -m pip install --upgrade pip
|
||
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
|
||
pip install unimol_tools
|
||
pip install huggingface_hub
|
||
```
|
||
|
||
|
||
## result
|
||
|
||
```shell
|
||
(analyse) (base) root@DESK4090:/mnt/c/project/qsar/MIC# python qsar_1D.py
|
||
[1D-QSAR][Linear Regression] MSE:32.3949 R2:0.6525
|
||
Model saved to 1d_qsar_linear_regression_model.pkl
|
||
[1D-QSAR][Stochastic Gradient Descent] MSE:230009980374197965960989638656.0000 R2:-2467672699617844819673481216.0000
|
||
Model saved to 1d_qsar_stochastic_gradient_descent_model.pkl
|
||
[1D-QSAR][K-Nearest Neighbors] MSE:30.2081 R2:0.6759
|
||
Model saved to 1d_qsar_k-nearest_neighbors_model.pkl
|
||
[1D-QSAR][Decision Tree] MSE:27.7150 R2:0.7027
|
||
Model saved to 1d_qsar_decision_tree_model.pkl
|
||
[1D-QSAR][Random Forest] MSE:26.5204 R2:0.7155
|
||
Model saved to 1d_qsar_random_forest_model.pkl
|
||
[1D-QSAR][XGBoost] MSE:27.7147 R2:0.7027
|
||
Model saved to 1d_qsar_xgboost_model.pkl
|
||
[1D-QSAR][Multi-layer Perceptron] MSE:143.3505 R2:-0.5379
|
||
Model saved to 1d_qsar_multi-layer_perceptron_model.pkl
|
||
---
|
||
[2D-QSAR][Linear Regression] MSE:30.1093 R2:0.6770
|
||
Model saved to 2d_qsar_linear_regression_model.pkl
|
||
[2D-QSAR][Stochastic Gradient Descent] MSE:33.7336 R2:0.6381
|
||
Model saved to 2d_qsar_stochastic_gradient_descent_model.pkl
|
||
[2D-QSAR][K-Nearest Neighbors] MSE:48.8179 R2:0.4763
|
||
Model saved to 2d_qsar_k-nearest_neighbors_model.pkl
|
||
[2D-QSAR][Decision Tree] MSE:30.2360 R2:0.6756
|
||
Model saved to 2d_qsar_decision_tree_model.pkl
|
||
[2D-QSAR][Random Forest] MSE:28.7916 R2:0.6911
|
||
Model saved to 2d_qsar_random_forest_model.pkl
|
||
[2D-QSAR][XGBoost] MSE:30.2351 R2:0.6756
|
||
Model saved to 2d_qsar_xgboost_model.pkl
|
||
[2D-QSAR][Multi-layer Perceptron] MSE:30.1715 R2:0.6763
|
||
Model saved to 2d_qsar_multi-layer_perceptron_model.pkl
|
||
---
|
||
[3D-QSAR][Stochastic Gradient Descent] MSE:64.5768 R2:0.3072
|
||
Model saved to 3d_qsar_stochastic_gradient_descent_model.pkl
|
||
[3D-QSAR][K-Nearest Neighbors] MSE:38.6921 R2:0.5849
|
||
Model saved to 3d_qsar_k-nearest_neighbors_model.pkl
|
||
[3D-QSAR][Decision Tree] MSE:30.2360 R2:0.6756
|
||
Model saved to 3d_qsar_decision_tree_model.pkl
|
||
[3D-QSAR][Random Forest] MSE:30.8310 R2:0.6692
|
||
Model saved to 3d_qsar_random_forest_model.pkl
|
||
[3D-QSAR][XGBoost] MSE:30.2362 R2:0.6756
|
||
Model saved to 3d_qsar_xgboost_model.pkl
|
||
[3D-QSAR][Multi-layer Perceptron] MSE:29.9844 R2:0.6783
|
||
Model saved to 3d_qsar_multi-layer_perceptron_model.pkl
|
||
---
|
||
unimol qsar
|
||
{'mse': 59.72037918598548, 'mae': 5.179289798987539, 'pearsonr': 0.638764928149331, 'spearmanr': 0.6006870492749102, 'r2': 0.35928715315601223}
|
||
```
|
||
|
||
## [deepMD-kit](https://docs.deepmodeling.com/projects/deepmd/en/master/index.html)
|
||
|
||
## [notebookLM](https://notebooklm.google.com/)
|
||
|
||
## [pytorch code](https://nn.labml.ai/zh/diffusion/ddpm/index.html)
|
||
|
||
## [3D-QSAR tutorial](https://bohrium.dp.tech/notebooks/1032)
|
||
|
||
## [molpipline](https://github.com/basf/MolPipeline)
|
||
|
||
DOI:https://doi.org/10.1021/acs.jcim.4c00863
|
||
|
||
MolToBinary:将分子转换为二进制格式的特征。这些特征可以是分子的指纹,通常用于计算相似性。
|
||
|
||
MolToConcatenatedVector:将多个特征向量连接起来,用于产生更丰富的特征表征。
|
||
|
||
MolToSmiles:将分子对象转换为 SMILES(Simplified Molecular Input Line Entry System)字符串格式。SMILES 是一种用于描述分子结构的字符串格式,非常适合用于分子结构数据的标准化表示。
|
||
|
||
MolToMACCSFP:用于计算 MACCS 键(分子结构关键子)指纹。这种类型的指纹是用于分子结构相似性计算和建模的标准特征。
|
||
|
||
MolToMorganFP:用于计算 Morgan 指纹(也称为径向指纹),可以选择位数和半径,这些指纹是分子的拓扑特征,常用于化学信息学的机器学习建模中。
|
||
|
||
MolToNetCharge:用于计算分子的净电荷,电荷信息对于理解分子的化学性质、反应性等非常重要。
|
||
|
||
Mol2PathFP:用于计算基于路径的指纹。这些指纹基于分子的连接路径来描述分子结构,可以用于相似性分析和模型训练。
|
||
|
||
MolToInchi 和 MolToInchiKey:将分子转换为 InChI(International Chemical Identifier)和 InChI Key。这些用于描述分子的标准化编码通常用于化学数据库中的分子唯一性标识。
|
||
|
||
MolToRDKitPhysChem:用于计算分子的理化性质(物理化学特性),例如分子量、TPSA(极性表面积)、氢键供体和受体数等。这些理化特性是机器学习建模中常用的基础特征。 |