# ChemStructLM ChemStructLM是一个旨在整合MolScribe和Vary库,让大语言模型能够理解、设计和改造化学结构、分子和蛋白质的项目。该项目通过在大语言模型中融合这两个强大的化学信息处理库,旨在推动化学和生物信息学领域的研究和应用。 ## 特性 整合了MolScribe和Vary库的功能。 提供了一个示例环境,用于同时使用这两个库。 包含了示例代码,展示如何将这两个库融合使用。 开源,鼓励社区贡献。 快速开始 先决条件 确保你有一个可用的Conda环境。如果没有,可以访问Miniconda或Anaconda官方网站了解如何安装。推荐使用micromamba作为包管理器。 ## 安装 克隆ChemStructLM仓库到你的本地机器: ```shell git clone https://github.com/hotwa/ChemStructLM.git cd ChemStructLM ``` 初始化子模块(MolScribe和Vary): ```shell git submodule update --init --recursive ``` 检出MolScribe和Vary的特定版本: ```shell git submodule add https://github.com/thomas0809/MolScribe molscribe git submodule add https://github.com/Ucas-HaoranWei/Vary-toy vary-toy git submodule add https://github.com/h-zhao1997/cobra cobra cd molscribe git rev-parse HEAD git checkout 97acee57d10bd719f4dc1cfd30d09f142b7dc65f cd ../vary-toy git rev-parse HEAD git checkout e94e50f4b10c7b0f2a29e4d8b3804a35024b0565 cd .. cd cobra git rev-parse HEAD git checkout 365a24d360f9c0d1ed4db96acc3a76b12782d138 ``` 创建并激活Conda环境: ```shell micromamba env create -f environment.yml micromamba run -n ChemStructLM python -m pip install -r requirements.txt -i https://pypi.mirrors.ustc.edu.cn/simple/ # install flash-attention micromamba run -n ChemStructLM python -m pip install flash-attn --no-build-isolation # install vary cd vary/Vary-master # change pth path in source code # `vary/Vary-master/vary/demo/run_qwen_vary.py` and `vary/Vary-master/vary/model/vary_qwen_vary.py` in line `/cache/vit-large-patch14/` # download from https://huggingface.co/openai/clip-vit-large-patch14/tree/main sed -i 's|/cache/vit-large-patch14/|/media/lingyuzeng/c617029b-8496-684d-b402-a18f08e75ef13/project/Vary-toy/clip-vit-large-patch14/|g' vary/Vary-master/vary/demo/run_qwen_vary.py sed -i 's|/cache/vit-large-patch14/|/media/lingyuzeng/c617029b-8496-684d-b402-a18f08e75ef13/project/Vary-toy/clip-vit-large-patch14/|g' vary/Vary-master/vary/model/vary_qwen_vary.py sed -i 's/self.seq_length = config.seq_length/self.seq_length = config.max_length/' vary-toy/Vary-master/vary/model/llm/qwen/modeling_qwen.py micromamba run -n ChemStructLM python -m pip install e . # install molscribe cd ../../molscribe micromamba run -n ChemStructLM python -m pip install -r requirements.txt -i https://pypi.mirrors.ustc.edu.cn/simple/ sed -i 's/timm==0.4.12/timm==0.6.13/' setup.py micromamba run -n ChemStructLM python setup.py install cd .. micromamba activate ChemStructLM # test python vary/Vary-master/vary/demo/run_qwen_vary.py --model-name /media/lingyuzeng/c617029b-8496-684d-b402-a18f08e75ef13/project/Vary-toy/clip-vit-large-patch14/ --image-file /mnt/wtrr1/deepinproject/chemstructlm/vary/assets/vary.png ``` test: ```shell micromamba create -n test python=3.10 -y cd vary-toy/Vary-master micromamba run -n test python -m pip install e . -i https://pypi.mirrors.ustc.edu.cn/simple/ cd ../../molscribe micromamba run -n test python setup.py install cd .. micromamba activate test ``` MolScribe test: ```shell import torch from molscribe import MolScribe # from huggingface_hub import hf_hub_download # ckpt_path = hf_hub_download('yujieq/MolScribe', 'swin_base_char_aux_1m.pth') ckpt_path = '/mnt/wtrr1/deepinproject/MolScribe/ckpts/swin_base_char_aux_1m680k.pth' model = MolScribe(ckpt_path, device=torch.device('cpu')) output = model.predict_image_file('molscribe/assets/example.png', return_atoms_bonds=True, return_confidence=True) ``` [Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference](https://github.com/h-zhao1997/cobra) 安装: ```shell cd cobra pip install -e . # install mamba and other packages pip install packaging ninja pip install mamba-ssm pip install causal-conv1d # Verify Ninja --> should return exit code "0" ninja --version; echo $? ``` ## 目录结构 ```shell YourProjectName/ │ ├── src/ # 项目源代码 │ ├── integration/ # 融合MolScribe和Vary的代码 │ └── utils/ # 工具和辅助功能 ├── tests/ # 测试脚本 ├── vary/ # Vary项目作为子模块 ├── molscribe/ # MolScribe项目作为子模块 ├── environments/ # 环境配置文件,如conda环境文件 ├── README.md # 项目说明文件 └── .gitmodules # git子模块配置文件 ```