Files
chemstructlm/README.md
2024-05-05 19:56:00 +08:00

132 lines
4.7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ChemStructLM
ChemStructLM是一个旨在整合MolScribe和Vary库让大语言模型能够理解、设计和改造化学结构、分子和蛋白质的项目。该项目通过在大语言模型中融合这两个强大的化学信息处理库旨在推动化学和生物信息学领域的研究和应用。
## 特性
整合了MolScribe和Vary库的功能。
提供了一个示例环境,用于同时使用这两个库。
包含了示例代码,展示如何将这两个库融合使用。
开源,鼓励社区贡献。
快速开始
先决条件
确保你有一个可用的Conda环境。如果没有可以访问Miniconda或Anaconda官方网站了解如何安装。推荐使用micromamba作为包管理器。
## 安装
克隆ChemStructLM仓库到你的本地机器
```shell
git clone https://github.com/hotwa/ChemStructLM.git
cd ChemStructLM
```
初始化子模块MolScribe和Vary
```shell
git submodule update --init --recursive
```
检出MolScribe和Vary的特定版本
```shell
git submodule add https://github.com/thomas0809/MolScribe molscribe
git submodule add https://github.com/Ucas-HaoranWei/Vary-toy vary-toy
git submodule add https://github.com/h-zhao1997/cobra cobra
cd molscribe
git rev-parse HEAD
git checkout 97acee57d10bd719f4dc1cfd30d09f142b7dc65f
cd ../vary-toy
git rev-parse HEAD
git checkout e94e50f4b10c7b0f2a29e4d8b3804a35024b0565
cd ..
cd cobra
git rev-parse HEAD
git checkout 365a24d360f9c0d1ed4db96acc3a76b12782d138
```
创建并激活Conda环境
```shell
micromamba env create -f environment.yml
micromamba run -n ChemStructLM python -m pip install -r requirements.txt -i https://pypi.mirrors.ustc.edu.cn/simple/
# install flash-attention
micromamba run -n ChemStructLM python -m pip install flash-attn --no-build-isolation
# install vary
cd vary/Vary-master
# change pth path in source code
# `vary/Vary-master/vary/demo/run_qwen_vary.py` and `vary/Vary-master/vary/model/vary_qwen_vary.py` in line `/cache/vit-large-patch14/`
# download from https://huggingface.co/openai/clip-vit-large-patch14/tree/main
sed -i 's|/cache/vit-large-patch14/|/media/lingyuzeng/c617029b-8496-684d-b402-a18f08e75ef13/project/Vary-toy/clip-vit-large-patch14/|g' vary/Vary-master/vary/demo/run_qwen_vary.py
sed -i 's|/cache/vit-large-patch14/|/media/lingyuzeng/c617029b-8496-684d-b402-a18f08e75ef13/project/Vary-toy/clip-vit-large-patch14/|g' vary/Vary-master/vary/model/vary_qwen_vary.py
sed -i 's/self.seq_length = config.seq_length/self.seq_length = config.max_length/' vary-toy/Vary-master/vary/model/llm/qwen/modeling_qwen.py
micromamba run -n ChemStructLM python -m pip install e .
# install molscribe
cd ../../molscribe
micromamba run -n ChemStructLM python -m pip install -r requirements.txt -i https://pypi.mirrors.ustc.edu.cn/simple/
sed -i 's/timm==0.4.12/timm==0.6.13/' setup.py
micromamba run -n ChemStructLM python setup.py install
cd ..
micromamba activate ChemStructLM
# test
python vary/Vary-master/vary/demo/run_qwen_vary.py --model-name /media/lingyuzeng/c617029b-8496-684d-b402-a18f08e75ef13/project/Vary-toy/clip-vit-large-patch14/ --image-file /mnt/wtrr1/deepinproject/chemstructlm/vary/assets/vary.png
```
test:
```shell
micromamba create -n test python=3.10 -y
cd vary-toy/Vary-master
micromamba run -n test python -m pip install e . -i https://pypi.mirrors.ustc.edu.cn/simple/
cd ../../molscribe
micromamba run -n test python setup.py install
cd ..
micromamba activate test
```
MolScribe test:
```shell
import torch
from molscribe import MolScribe
# from huggingface_hub import hf_hub_download
# ckpt_path = hf_hub_download('yujieq/MolScribe', 'swin_base_char_aux_1m.pth')
ckpt_path = '/mnt/wtrr1/deepinproject/MolScribe/ckpts/swin_base_char_aux_1m680k.pth'
model = MolScribe(ckpt_path, device=torch.device('cpu'))
output = model.predict_image_file('molscribe/assets/example.png', return_atoms_bonds=True, return_confidence=True)
```
[Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference](https://github.com/h-zhao1997/cobra)
安装:
```shell
cd cobra
pip install -e .
# install mamba and other packages
pip install packaging ninja
pip install mamba-ssm
pip install causal-conv1d
# Verify Ninja --> should return exit code "0"
ninja --version; echo $?
```
## 目录结构
```shell
YourProjectName/
├── src/ # 项目源代码
│ ├── integration/ # 融合MolScribe和Vary的代码
│ └── utils/ # 工具和辅助功能
├── tests/ # 测试脚本
├── vary/ # Vary项目作为子模块
├── molscribe/ # MolScribe项目作为子模块
├── environments/ # 环境配置文件如conda环境文件
├── README.md # 项目说明文件
└── .gitmodules # git子模块配置文件
```