2024-05-05 19:56:00 +08:00
2024-05-05 19:56:00 +08:00
2024-04-06 21:12:05 +08:00
2024-05-05 19:56:00 +08:00
2024-05-05 19:56:00 +08:00
2024-05-05 19:56:00 +08:00
2024-05-05 19:56:00 +08:00
2024-05-05 19:56:00 +08:00

ChemStructLM

ChemStructLM是一个旨在整合MolScribe和Vary库让大语言模型能够理解、设计和改造化学结构、分子和蛋白质的项目。该项目通过在大语言模型中融合这两个强大的化学信息处理库旨在推动化学和生物信息学领域的研究和应用。

特性

整合了MolScribe和Vary库的功能。 提供了一个示例环境,用于同时使用这两个库。 包含了示例代码,展示如何将这两个库融合使用。 开源,鼓励社区贡献。 快速开始 先决条件 确保你有一个可用的Conda环境。如果没有可以访问Miniconda或Anaconda官方网站了解如何安装。推荐使用micromamba作为包管理器。

安装

克隆ChemStructLM仓库到你的本地机器

git clone https://github.com/hotwa/ChemStructLM.git
cd ChemStructLM

初始化子模块MolScribe和Vary

git submodule update --init --recursive

检出MolScribe和Vary的特定版本

git submodule add https://github.com/thomas0809/MolScribe molscribe
git submodule add https://github.com/Ucas-HaoranWei/Vary-toy vary-toy
git submodule add https://github.com/h-zhao1997/cobra cobra
cd molscribe
git rev-parse HEAD
git checkout 97acee57d10bd719f4dc1cfd30d09f142b7dc65f
cd ../vary-toy
git rev-parse HEAD
git checkout e94e50f4b10c7b0f2a29e4d8b3804a35024b0565
cd ..
cd cobra
git rev-parse HEAD
git checkout 365a24d360f9c0d1ed4db96acc3a76b12782d138

创建并激活Conda环境

micromamba env create -f environment.yml
micromamba run -n ChemStructLM python -m pip install -r requirements.txt -i https://pypi.mirrors.ustc.edu.cn/simple/
# install flash-attention
micromamba run -n ChemStructLM python -m pip install flash-attn --no-build-isolation
# install vary
cd vary/Vary-master
# change pth path in source code
# `vary/Vary-master/vary/demo/run_qwen_vary.py` and `vary/Vary-master/vary/model/vary_qwen_vary.py` in line `/cache/vit-large-patch14/` 
# download from https://huggingface.co/openai/clip-vit-large-patch14/tree/main
sed -i 's|/cache/vit-large-patch14/|/media/lingyuzeng/c617029b-8496-684d-b402-a18f08e75ef13/project/Vary-toy/clip-vit-large-patch14/|g' vary/Vary-master/vary/demo/run_qwen_vary.py
sed -i 's|/cache/vit-large-patch14/|/media/lingyuzeng/c617029b-8496-684d-b402-a18f08e75ef13/project/Vary-toy/clip-vit-large-patch14/|g' vary/Vary-master/vary/model/vary_qwen_vary.py
sed -i 's/self.seq_length = config.seq_length/self.seq_length = config.max_length/' vary-toy/Vary-master/vary/model/llm/qwen/modeling_qwen.py
micromamba run -n ChemStructLM python -m pip install e . 
# install molscribe
cd ../../molscribe
micromamba run -n ChemStructLM python -m pip install -r requirements.txt -i https://pypi.mirrors.ustc.edu.cn/simple/
sed -i 's/timm==0.4.12/timm==0.6.13/' setup.py
micromamba run -n ChemStructLM python setup.py install
cd ..
micromamba activate ChemStructLM
# test
python vary/Vary-master/vary/demo/run_qwen_vary.py  --model-name /media/lingyuzeng/c617029b-8496-684d-b402-a18f08e75ef13/project/Vary-toy/clip-vit-large-patch14/ --image-file /mnt/wtrr1/deepinproject/chemstructlm/vary/assets/vary.png

test:

micromamba create -n test python=3.10 -y
cd vary-toy/Vary-master
micromamba run -n test python -m pip install e . -i https://pypi.mirrors.ustc.edu.cn/simple/
cd ../../molscribe
micromamba run -n test python setup.py install
cd ..
micromamba activate test

MolScribe test:

import torch
from molscribe import MolScribe
# from huggingface_hub import hf_hub_download

# ckpt_path = hf_hub_download('yujieq/MolScribe', 'swin_base_char_aux_1m.pth')
ckpt_path = '/mnt/wtrr1/deepinproject/MolScribe/ckpts/swin_base_char_aux_1m680k.pth'

model = MolScribe(ckpt_path, device=torch.device('cpu'))
output = model.predict_image_file('molscribe/assets/example.png', return_atoms_bonds=True, return_confidence=True)

Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference

安装:

cd cobra
pip install -e .

# install mamba and other packages
pip install packaging ninja
pip install mamba-ssm
pip install causal-conv1d

# Verify Ninja --> should return exit code "0"
ninja --version; echo $?

目录结构

YourProjectName/
│
├── src/               # 项目源代码
│   ├── integration/   # 融合MolScribe和Vary的代码
│   └── utils/         # 工具和辅助功能
├── tests/             # 测试脚本
├── vary/              # Vary项目作为子模块
├── molscribe/         # MolScribe项目作为子模块
├── environments/      # 环境配置文件如conda环境文件
├── README.md          # 项目说明文件
└── .gitmodules        # git子模块配置文件
Description
No description provided
Readme 35 KiB
Languages
Text 100%