Files
qsar/README.md
2024-10-10 17:16:48 +08:00

113 lines
5.0 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# qsar
## Getting started
[chembl](https://www.ebi.ac.uk/chembl/)
DOI https://doi.org/10.1016/j.ejmech.2022.114495
Design, synthesis and activity against drug-resistant bacteria evaluation of C-20, C-23 modified 5-O-mycaminosyltylonolide derivatives
A类似物22个活性数据
B类似物7个活性数据
C类似物47个活性数据
检索条件Structure2D_A1.mol 85% 以上相似度
检索结果A_85.csv
## env
[unimol install](https://unimol.readthedocs.io/en/latest/installation.html#install)
```shell
python -m pip install --upgrade pip
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
pip install unimol_tools
pip install huggingface_hub
```
## result
```shell
(analyse) (base) root@DESK4090:/mnt/c/project/qsar/MIC# python qsar_1D.py
[1D-QSAR][Linear Regression] MSE:32.3949 R2:0.6525
Model saved to 1d_qsar_linear_regression_model.pkl
[1D-QSAR][Stochastic Gradient Descent] MSE:230009980374197965960989638656.0000 R2:-2467672699617844819673481216.0000
Model saved to 1d_qsar_stochastic_gradient_descent_model.pkl
[1D-QSAR][K-Nearest Neighbors] MSE:30.2081 R2:0.6759
Model saved to 1d_qsar_k-nearest_neighbors_model.pkl
[1D-QSAR][Decision Tree] MSE:27.7150 R2:0.7027
Model saved to 1d_qsar_decision_tree_model.pkl
[1D-QSAR][Random Forest] MSE:26.5204 R2:0.7155
Model saved to 1d_qsar_random_forest_model.pkl
[1D-QSAR][XGBoost] MSE:27.7147 R2:0.7027
Model saved to 1d_qsar_xgboost_model.pkl
[1D-QSAR][Multi-layer Perceptron] MSE:143.3505 R2:-0.5379
Model saved to 1d_qsar_multi-layer_perceptron_model.pkl
---
[2D-QSAR][Linear Regression] MSE:30.1093 R2:0.6770
Model saved to 2d_qsar_linear_regression_model.pkl
[2D-QSAR][Stochastic Gradient Descent] MSE:33.7336 R2:0.6381
Model saved to 2d_qsar_stochastic_gradient_descent_model.pkl
[2D-QSAR][K-Nearest Neighbors] MSE:48.8179 R2:0.4763
Model saved to 2d_qsar_k-nearest_neighbors_model.pkl
[2D-QSAR][Decision Tree] MSE:30.2360 R2:0.6756
Model saved to 2d_qsar_decision_tree_model.pkl
[2D-QSAR][Random Forest] MSE:28.7916 R2:0.6911
Model saved to 2d_qsar_random_forest_model.pkl
[2D-QSAR][XGBoost] MSE:30.2351 R2:0.6756
Model saved to 2d_qsar_xgboost_model.pkl
[2D-QSAR][Multi-layer Perceptron] MSE:30.1715 R2:0.6763
Model saved to 2d_qsar_multi-layer_perceptron_model.pkl
---
[3D-QSAR][Stochastic Gradient Descent] MSE:64.5768 R2:0.3072
Model saved to 3d_qsar_stochastic_gradient_descent_model.pkl
[3D-QSAR][K-Nearest Neighbors] MSE:38.6921 R2:0.5849
Model saved to 3d_qsar_k-nearest_neighbors_model.pkl
[3D-QSAR][Decision Tree] MSE:30.2360 R2:0.6756
Model saved to 3d_qsar_decision_tree_model.pkl
[3D-QSAR][Random Forest] MSE:30.8310 R2:0.6692
Model saved to 3d_qsar_random_forest_model.pkl
[3D-QSAR][XGBoost] MSE:30.2362 R2:0.6756
Model saved to 3d_qsar_xgboost_model.pkl
[3D-QSAR][Multi-layer Perceptron] MSE:29.9844 R2:0.6783
Model saved to 3d_qsar_multi-layer_perceptron_model.pkl
---
unimol qsar
{'mse': 59.72037918598548, 'mae': 5.179289798987539, 'pearsonr': 0.638764928149331, 'spearmanr': 0.6006870492749102, 'r2': 0.35928715315601223}
```
## [deepMD-kit](https://docs.deepmodeling.com/projects/deepmd/en/master/index.html)
## [notebookLM](https://notebooklm.google.com/)
## [pytorch code](https://nn.labml.ai/zh/diffusion/ddpm/index.html)
## [3D-QSAR tutorial](https://bohrium.dp.tech/notebooks/1032)
## [molpipline](https://github.com/basf/MolPipeline)
DOI:https://doi.org/10.1021/acs.jcim.4c00863
MolToBinary将分子转换为二进制格式的特征。这些特征可以是分子的指纹通常用于计算相似性。
MolToConcatenatedVector将多个特征向量连接起来用于产生更丰富的特征表征。
MolToSmiles将分子对象转换为 SMILESSimplified Molecular Input Line Entry System字符串格式。SMILES 是一种用于描述分子结构的字符串格式,非常适合用于分子结构数据的标准化表示。
MolToMACCSFP用于计算 MACCS 键(分子结构关键子)指纹。这种类型的指纹是用于分子结构相似性计算和建模的标准特征。
MolToMorganFP用于计算 Morgan 指纹(也称为径向指纹),可以选择位数和半径,这些指纹是分子的拓扑特征,常用于化学信息学的机器学习建模中。
MolToNetCharge用于计算分子的净电荷电荷信息对于理解分子的化学性质、反应性等非常重要。
Mol2PathFP用于计算基于路径的指纹。这些指纹基于分子的连接路径来描述分子结构可以用于相似性分析和模型训练。
MolToInchi 和 MolToInchiKey将分子转换为 InChIInternational Chemical Identifier和 InChI Key。这些用于描述分子的标准化编码通常用于化学数据库中的分子唯一性标识。
MolToRDKitPhysChem用于计算分子的理化性质物理化学特性例如分子量、TPSA极性表面积、氢键供体和受体数等。这些理化特性是机器学习建模中常用的基础特征。