ChemStructLM
ChemStructLM是一个旨在整合MolScribe和Vary库,让大语言模型能够理解、设计和改造化学结构、分子和蛋白质的项目。该项目通过在大语言模型中融合这两个强大的化学信息处理库,旨在推动化学和生物信息学领域的研究和应用。
特性
整合了MolScribe和Vary库的功能。 提供了一个示例环境,用于同时使用这两个库。 包含了示例代码,展示如何将这两个库融合使用。 开源,鼓励社区贡献。 快速开始 先决条件 确保你有一个可用的Conda环境。如果没有,可以访问Miniconda或Anaconda官方网站了解如何安装。推荐使用micromamba作为包管理器。
安装
克隆ChemStructLM仓库到你的本地机器:
git clone https://github.com/hotwa/ChemStructLM.git
cd ChemStructLM
初始化子模块(MolScribe和Vary):
git submodule update --init --recursive
检出MolScribe和Vary的特定版本:
git submodule add https://github.com/thomas0809/MolScribe molscribe
git submodule add https://github.com/Ucas-HaoranWei/Vary-toy vary-toy
git submodule add https://github.com/h-zhao1997/cobra cobra
cd molscribe
git rev-parse HEAD
git checkout 97acee57d10bd719f4dc1cfd30d09f142b7dc65f
cd ../vary-toy
git rev-parse HEAD
git checkout e94e50f4b10c7b0f2a29e4d8b3804a35024b0565
cd ..
cd cobra
git rev-parse HEAD
git checkout 365a24d360f9c0d1ed4db96acc3a76b12782d138
创建并激活Conda环境:
micromamba env create -f environment.yml
micromamba run -n ChemStructLM python -m pip install -r requirements.txt -i https://pypi.mirrors.ustc.edu.cn/simple/
# install flash-attention
micromamba run -n ChemStructLM python -m pip install flash-attn --no-build-isolation
# install vary
cd vary/Vary-master
# change pth path in source code
# `vary/Vary-master/vary/demo/run_qwen_vary.py` and `vary/Vary-master/vary/model/vary_qwen_vary.py` in line `/cache/vit-large-patch14/`
# download from https://huggingface.co/openai/clip-vit-large-patch14/tree/main
sed -i 's|/cache/vit-large-patch14/|/media/lingyuzeng/c617029b-8496-684d-b402-a18f08e75ef13/project/Vary-toy/clip-vit-large-patch14/|g' vary/Vary-master/vary/demo/run_qwen_vary.py
sed -i 's|/cache/vit-large-patch14/|/media/lingyuzeng/c617029b-8496-684d-b402-a18f08e75ef13/project/Vary-toy/clip-vit-large-patch14/|g' vary/Vary-master/vary/model/vary_qwen_vary.py
sed -i 's/self.seq_length = config.seq_length/self.seq_length = config.max_length/' vary-toy/Vary-master/vary/model/llm/qwen/modeling_qwen.py
micromamba run -n ChemStructLM python -m pip install e .
# install molscribe
cd ../../molscribe
micromamba run -n ChemStructLM python -m pip install -r requirements.txt -i https://pypi.mirrors.ustc.edu.cn/simple/
sed -i 's/timm==0.4.12/timm==0.6.13/' setup.py
micromamba run -n ChemStructLM python setup.py install
cd ..
micromamba activate ChemStructLM
# test
python vary/Vary-master/vary/demo/run_qwen_vary.py --model-name /media/lingyuzeng/c617029b-8496-684d-b402-a18f08e75ef13/project/Vary-toy/clip-vit-large-patch14/ --image-file /mnt/wtrr1/deepinproject/chemstructlm/vary/assets/vary.png
test:
micromamba create -n test python=3.10 -y
cd vary-toy/Vary-master
micromamba run -n test python -m pip install e . -i https://pypi.mirrors.ustc.edu.cn/simple/
cd ../../molscribe
micromamba run -n test python setup.py install
cd ..
micromamba activate test
MolScribe test:
import torch
from molscribe import MolScribe
# from huggingface_hub import hf_hub_download
# ckpt_path = hf_hub_download('yujieq/MolScribe', 'swin_base_char_aux_1m.pth')
ckpt_path = '/mnt/wtrr1/deepinproject/MolScribe/ckpts/swin_base_char_aux_1m680k.pth'
model = MolScribe(ckpt_path, device=torch.device('cpu'))
output = model.predict_image_file('molscribe/assets/example.png', return_atoms_bonds=True, return_confidence=True)
Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference
安装:
cd cobra
pip install -e .
# install mamba and other packages
pip install packaging ninja
pip install mamba-ssm
pip install causal-conv1d
# Verify Ninja --> should return exit code "0"
ninja --version; echo $?
目录结构
YourProjectName/
│
├── src/ # 项目源代码
│ ├── integration/ # 融合MolScribe和Vary的代码
│ └── utils/ # 工具和辅助功能
├── tests/ # 测试脚本
├── vary/ # Vary项目作为子模块
├── molscribe/ # MolScribe项目作为子模块
├── environments/ # 环境配置文件,如conda环境文件
├── README.md # 项目说明文件
└── .gitmodules # git子模块配置文件
Description
Languages
Text
100%