139 lines
4.1 KiB
Markdown
139 lines
4.1 KiB
Markdown
## MolE 广谱抗菌预测 API
|
||
|
||
测试案例: example_usage.py
|
||
|
||
## 功能特性
|
||
|
||
1. **高性能并行处理** - 支持多进程并行计算,显著提高大批量分子预测速度
|
||
2. **多种使用方式** - 提供Python API、命令行工具和Web服务三种使用方式
|
||
3. **模块化设计** - 易于集成到其他项目中
|
||
4. **灵活配置** - 支持自定义模型路径、阈值等参数
|
||
|
||
## 安装
|
||
|
||
```bash
|
||
pip install -e .
|
||
```
|
||
|
||
## 使用方式
|
||
|
||
### 1. Python API
|
||
|
||
```python
|
||
from broad_spectrum_parallel import predict_smiles, MoleculeInput
|
||
|
||
# 预测单个或多个SMILES
|
||
results = predict_smiles(["CCO", "CCN"], ["ethanol", "ethylamine"])
|
||
|
||
for result in results:
|
||
print(f"{result.chem_id}: 广谱={result.broad_spectrum}, 抑制数={result.ginhib_total}")
|
||
```
|
||
|
||
### 2. 命令行工具
|
||
|
||
```bash
|
||
# 基本用法
|
||
predict_antimicrobial input.tsv output.tsv --smiles_input --smiles_colname smiles --chemid_colname chem_id
|
||
|
||
# 聚合预测结果
|
||
predict_antimicrobial input.tsv output.tsv --smiles_input --aggregate_scores
|
||
```
|
||
|
||
### 3. Web API服务
|
||
|
||
```bash
|
||
# 启动服务
|
||
uvicorn broad_spectrum_parallel.api:app --host 0.0.0.0 --port 8000
|
||
```
|
||
|
||
然后可以通过POST请求访问`http://localhost:8000/predict`端点:
|
||
|
||
```bash
|
||
curl -X POST "http://localhost:8000/predict" \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"smiles": ["CCO", "CCN"]}'
|
||
```
|
||
|
||
## 结果解读
|
||
|
||
从运行结果可以看到,每个化合物返回8个关键指标:
|
||
|
||
1. 抗菌潜力分数(对数尺度):
|
||
|
||
apscore_total: -11.758 - 总体抗菌分数
|
||
apscore_gnegative: -11.648 - 革兰阴性菌抗菌分数
|
||
apscore_gpositive: -11.848 - 革兰阳性菌抗菌分数
|
||
2. 抑制菌株统计:
|
||
|
||
ginhib_total: 0 - 总抑制菌株数
|
||
ginhib_gnegative: 0 - 抑制的革兰阴性菌株数
|
||
ginhib_gpositive: 0 - 抑制的革兰阳性菌株数
|
||
3. 广谱判定:
|
||
|
||
broad_spectrum: 0 - 是否广谱抗菌(需抑制≥10个菌株)
|
||
🧪 结果解释示例
|
||
以乙醇(CCO)为例:
|
||
|
||
抗菌分数很低 (-11.758):表明预测的抗菌活性很弱
|
||
无菌株抑制 (0):在设定阈值下不能有效抑制任何测试菌株
|
||
非广谱抗菌 (0):不满足广谱抗菌的最低标准
|
||
这个结果符合预期,因为乙醇虽有杀菌作用,但在药物发现的标准下不被认为是有效的抗菌候选化合物。
|
||
|
||
## 可以运行的菌株信息
|
||
|
||
```shell
|
||
(mole) root@DESK4090:/srv/project/mole_antimicrobial_potential/broad_spectrum_parallel# micromamba run -n mole python -c "
|
||
> import pandas as pd
|
||
> import numpy as np
|
||
>
|
||
> # 加载菌株筛选数据
|
||
> maier_screen = pd.read_csv('data/01.prepare_training_data/maier_screening_results.tsv.gz', sep='\t', index_col=0)
|
||
> print(f'总菌株数量: {len(maier_screen.columns)}')
|
||
> print(f'总化合物数量: {len(maier_screen.index)}')
|
||
> print(f'菌株列表前10个:')
|
||
> for i, strain in enumerate(maier_screen.columns[:10]):
|
||
> print(f'{i+1}. {strain}')
|
||
>
|
||
> # 加载革兰染色信息
|
||
> gram_info = pd.read_excel('raw_data/maier_microbiome/strain_info_SF2.xlsx',
|
||
> skiprows=[0, 1, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54],
|
||
> index_col='NT data base')
|
||
> print(f'\n革兰染色信息:')
|
||
> print(gram_info['Gram stain'].value_counts())
|
||
> "
|
||
总菌株数量: 40
|
||
总化合物数量: 1197
|
||
菌株列表前10个:
|
||
1. Akkermansia muciniphila (NT5021)
|
||
2. Bacteroides caccae (NT5050)
|
||
3. Bacteroides fragilis (ET) (NT5033)
|
||
4. Bacteroides fragilis (NT) (NT5003)
|
||
5. Bacteroides ovatus (NT5054)
|
||
6. Bacteroides thetaiotaomicron (NT5004)
|
||
7. Bacteroides uniformis (NT5002)
|
||
8. Bacteroides vulgatus (NT5001)
|
||
9. Bacteroides xylanisolvens (NT5064)
|
||
10. Bifidobacterium adolescentis (NT5022)
|
||
/root/micromamba/envs/mole/lib/python3.10/site-packages/openpyxl/worksheet/_reader.py:329: UserWarning: Unknown extension is not supported and will be removed
|
||
warn(msg)
|
||
|
||
革兰染色信息:
|
||
Gram stain
|
||
positive 22
|
||
negative 18
|
||
Name: count, dtype: int64
|
||
```
|
||
|
||
## 权重下载
|
||
|
||
mole
|
||
https://www.alipan.com/s/DNuDo8iEn89
|
||
提取码: mh90
|
||
|
||
下载完成放到:pretrained_model/model_ginconcat_btwin_100k_d8000_l0.0001
|
||
|
||
## 原始论文与github仓库
|
||
|
||
https://www.nature.com/articles/s41467-025-58804-4
|
||
|
||
https://github.com/rolayoalarcon/mole_antimicrobial_potential |