132 lines
4.7 KiB
Markdown
132 lines
4.7 KiB
Markdown
# ChemStructLM
|
||
|
||
ChemStructLM是一个旨在整合MolScribe和Vary库,让大语言模型能够理解、设计和改造化学结构、分子和蛋白质的项目。该项目通过在大语言模型中融合这两个强大的化学信息处理库,旨在推动化学和生物信息学领域的研究和应用。
|
||
|
||
## 特性
|
||
|
||
整合了MolScribe和Vary库的功能。
|
||
提供了一个示例环境,用于同时使用这两个库。
|
||
包含了示例代码,展示如何将这两个库融合使用。
|
||
开源,鼓励社区贡献。
|
||
快速开始
|
||
先决条件
|
||
确保你有一个可用的Conda环境。如果没有,可以访问Miniconda或Anaconda官方网站了解如何安装。推荐使用micromamba作为包管理器。
|
||
|
||
## 安装
|
||
|
||
克隆ChemStructLM仓库到你的本地机器:
|
||
|
||
```shell
|
||
git clone https://github.com/hotwa/ChemStructLM.git
|
||
cd ChemStructLM
|
||
```
|
||
|
||
初始化子模块(MolScribe和Vary):
|
||
|
||
```shell
|
||
git submodule update --init --recursive
|
||
```
|
||
|
||
检出MolScribe和Vary的特定版本:
|
||
|
||
```shell
|
||
git submodule add https://github.com/thomas0809/MolScribe molscribe
|
||
git submodule add https://github.com/Ucas-HaoranWei/Vary-toy vary-toy
|
||
git submodule add https://github.com/h-zhao1997/cobra cobra
|
||
cd molscribe
|
||
git rev-parse HEAD
|
||
git checkout 97acee57d10bd719f4dc1cfd30d09f142b7dc65f
|
||
cd ../vary-toy
|
||
git rev-parse HEAD
|
||
git checkout e94e50f4b10c7b0f2a29e4d8b3804a35024b0565
|
||
cd ..
|
||
cd cobra
|
||
git rev-parse HEAD
|
||
git checkout 365a24d360f9c0d1ed4db96acc3a76b12782d138
|
||
```
|
||
|
||
创建并激活Conda环境:
|
||
|
||
```shell
|
||
micromamba env create -f environment.yml
|
||
micromamba run -n ChemStructLM python -m pip install -r requirements.txt -i https://pypi.mirrors.ustc.edu.cn/simple/
|
||
# install flash-attention
|
||
micromamba run -n ChemStructLM python -m pip install flash-attn --no-build-isolation
|
||
# install vary
|
||
cd vary/Vary-master
|
||
# change pth path in source code
|
||
# `vary/Vary-master/vary/demo/run_qwen_vary.py` and `vary/Vary-master/vary/model/vary_qwen_vary.py` in line `/cache/vit-large-patch14/`
|
||
# download from https://huggingface.co/openai/clip-vit-large-patch14/tree/main
|
||
sed -i 's|/cache/vit-large-patch14/|/media/lingyuzeng/c617029b-8496-684d-b402-a18f08e75ef13/project/Vary-toy/clip-vit-large-patch14/|g' vary/Vary-master/vary/demo/run_qwen_vary.py
|
||
sed -i 's|/cache/vit-large-patch14/|/media/lingyuzeng/c617029b-8496-684d-b402-a18f08e75ef13/project/Vary-toy/clip-vit-large-patch14/|g' vary/Vary-master/vary/model/vary_qwen_vary.py
|
||
sed -i 's/self.seq_length = config.seq_length/self.seq_length = config.max_length/' vary-toy/Vary-master/vary/model/llm/qwen/modeling_qwen.py
|
||
micromamba run -n ChemStructLM python -m pip install e .
|
||
# install molscribe
|
||
cd ../../molscribe
|
||
micromamba run -n ChemStructLM python -m pip install -r requirements.txt -i https://pypi.mirrors.ustc.edu.cn/simple/
|
||
sed -i 's/timm==0.4.12/timm==0.6.13/' setup.py
|
||
micromamba run -n ChemStructLM python setup.py install
|
||
cd ..
|
||
micromamba activate ChemStructLM
|
||
# test
|
||
python vary/Vary-master/vary/demo/run_qwen_vary.py --model-name /media/lingyuzeng/c617029b-8496-684d-b402-a18f08e75ef13/project/Vary-toy/clip-vit-large-patch14/ --image-file /mnt/wtrr1/deepinproject/chemstructlm/vary/assets/vary.png
|
||
```
|
||
|
||
test:
|
||
|
||
```shell
|
||
micromamba create -n test python=3.10 -y
|
||
cd vary-toy/Vary-master
|
||
micromamba run -n test python -m pip install e . -i https://pypi.mirrors.ustc.edu.cn/simple/
|
||
cd ../../molscribe
|
||
micromamba run -n test python setup.py install
|
||
cd ..
|
||
micromamba activate test
|
||
```
|
||
|
||
MolScribe test:
|
||
|
||
```shell
|
||
import torch
|
||
from molscribe import MolScribe
|
||
# from huggingface_hub import hf_hub_download
|
||
|
||
# ckpt_path = hf_hub_download('yujieq/MolScribe', 'swin_base_char_aux_1m.pth')
|
||
ckpt_path = '/mnt/wtrr1/deepinproject/MolScribe/ckpts/swin_base_char_aux_1m680k.pth'
|
||
|
||
model = MolScribe(ckpt_path, device=torch.device('cpu'))
|
||
output = model.predict_image_file('molscribe/assets/example.png', return_atoms_bonds=True, return_confidence=True)
|
||
```
|
||
|
||
[Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference](https://github.com/h-zhao1997/cobra)
|
||
|
||
安装:
|
||
|
||
```shell
|
||
cd cobra
|
||
pip install -e .
|
||
|
||
# install mamba and other packages
|
||
pip install packaging ninja
|
||
pip install mamba-ssm
|
||
pip install causal-conv1d
|
||
|
||
# Verify Ninja --> should return exit code "0"
|
||
ninja --version; echo $?
|
||
```
|
||
|
||
## 目录结构
|
||
|
||
```shell
|
||
YourProjectName/
|
||
│
|
||
├── src/ # 项目源代码
|
||
│ ├── integration/ # 融合MolScribe和Vary的代码
|
||
│ └── utils/ # 工具和辅助功能
|
||
├── tests/ # 测试脚本
|
||
├── vary/ # Vary项目作为子模块
|
||
├── molscribe/ # MolScribe项目作为子模块
|
||
├── environments/ # 环境配置文件,如conda环境文件
|
||
├── README.md # 项目说明文件
|
||
└── .gitmodules # git子模块配置文件
|
||
``` |