- 移除 Motia Streams 实时通信,改用 3 秒轮询 - 简化前端代码,移除冗余组件 - 简化后端架构,准备 FastAPI 重构 - 更新 pixi.toml 环境配置 - 保留 bttoxin_digger_v5_repro 作为参考文档 Co-Authored-By: Claude <noreply@anthropic.com>
256 lines
9.0 KiB
Markdown
256 lines
9.0 KiB
Markdown
# BtToxin_Digger (pixi) reproduction
|
||
|
||
This repo is a **reproducible runtime environment + example outputs** for
|
||
BtToxin_Digger 1.0.10 with **BLAST v5 database compatibility**. It is **not**
|
||
an official fork or a new BtToxin_Digger release.
|
||
|
||
## License / Citation / Disclaimer
|
||
|
||
- **BtToxin_Digger** is developed by its original authors; cite the upstream
|
||
publication if you use it in research.
|
||
- **This repository** only provides an environment wrapper (pixi) and example
|
||
runs for reproducibility; it does not modify BtToxin_Digger source code.
|
||
- **Disclaimer**: This is an independent, community-maintained setup and is
|
||
not endorsed by the upstream authors.
|
||
|
||
This directory reproduces the BtToxin_Digger environment from
|
||
`quay.io/biocontainers/bttoxin_digger:1.0.10--hdfd78af_0` using pixi so the
|
||
`scripts/run_single_fna_pipeline.py` digger step can be run without Docker.
|
||
|
||
## 1) Environment definition (vs docker image)
|
||
|
||
- `pixi.toml` keeps `bttoxin_digger=1.0.10` + `perl=5.26.2` (legacy stack) while
|
||
upgrading `blast` to a v5-capable release for BLASTDB v5.
|
||
- Changes relative to `quay.io/biocontainers/bttoxin_digger:1.0.10--hdfd78af_0`:
|
||
- BLAST+ upgraded from 2.12.0 to 2.16.0 (required to read v5 databases).
|
||
- Explicitly pinned `perl-file-tee==0.07` and `perl-list-util==1.38`.
|
||
- `channel-priority = "disabled"` to allow mixing bioconda/conda-forge and
|
||
the legacy label for perl compatibility.
|
||
Create the environment:
|
||
|
||
```
|
||
cd /home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro
|
||
pixi install
|
||
```
|
||
|
||
## 2) Database wiring (BLAST v4 vs v5)
|
||
|
||
The external BTTCMP database under `external_dbs/bt_toxin` ships with a BLAST
|
||
v5 index (built by newer BLAST+). If you run with BLAST 2.7, you must rebuild
|
||
v4 databases; with BLAST >= 2.10, you can use the v5 database directly.
|
||
|
||
### Recommended: use the shared `external_dbs` (no copy)
|
||
|
||
Keep a single source of truth and link it into the pixi environment:
|
||
|
||
```
|
||
ENV_BIN=/home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro/.pixi/envs/default/bin
|
||
ln -sfn /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin \
|
||
"$ENV_BIN/BTTCMP_db/bt_toxin"
|
||
```
|
||
|
||
This avoids duplicating a large database inside the repo.
|
||
|
||
### Optional: freeze a snapshot inside this repo
|
||
|
||
If you want this repo to be self-contained, copy a snapshot and point the
|
||
environment at it (note: consider Git LFS if you intend to push it):
|
||
|
||
```
|
||
SNAPSHOT=/home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro/external_dbs_snapshot
|
||
mkdir -p "$SNAPSHOT"
|
||
cp -a /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin "$SNAPSHOT/"
|
||
ln -sfn "$SNAPSHOT/bt_toxin" "$ENV_BIN/BTTCMP_db/bt_toxin"
|
||
```
|
||
|
||
Rebuild `bt_toxin` using the external FASTA:
|
||
|
||
```
|
||
ENV_BIN=/home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro/.pixi/envs/default/bin
|
||
V4_DB=/home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro/bt_toxin_v4
|
||
|
||
mkdir -p "$V4_DB"
|
||
cp -a /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin/db "$V4_DB/"
|
||
ln -sfn /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin/seq "$V4_DB/seq"
|
||
|
||
"$ENV_BIN/makeblastdb" \
|
||
-in /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin/seq/bt_toxin20251104.fas \
|
||
-dbtype prot \
|
||
-out "$V4_DB/db/bt_toxin" \
|
||
-parse_seqids
|
||
|
||
ln -sfn "$V4_DB" "$ENV_BIN/BTTCMP_db/bt_toxin"
|
||
```
|
||
|
||
For BLAST v5 (current pixi.toml), point back to the external DB:
|
||
|
||
```
|
||
ln -sfn /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin \
|
||
"$ENV_BIN/BTTCMP_db/bt_toxin"
|
||
```
|
||
|
||
Rebuild the negative-set (back) database bundled with BtToxin_Digger:
|
||
|
||
```
|
||
"$ENV_BIN/makeblastdb" \
|
||
-in "$ENV_BIN/BTTCMP_db/back/seq/negative_set-20210607" \
|
||
-dbtype prot \
|
||
-out "$ENV_BIN/BTTCMP_db/back/db/back" \
|
||
-parse_seqids
|
||
```
|
||
|
||
## 3) Run BtToxin_Digger (assembled genome)
|
||
|
||
`run_digger_pixi.sh` sets `RATTLER_CACHE_DIR` inside this directory so pixi can
|
||
write its cache in the workspace (the default `~/.cache` path is blocked by the
|
||
sandbox).
|
||
|
||
Example for a single `.fna` (use a clean working directory):
|
||
|
||
```
|
||
mkdir -p /home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro/work/C15_pixi_run_v5
|
||
cd /home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro/work/C15_pixi_run_v5
|
||
|
||
bash ../run_digger_pixi.sh ../examples/inputs .fna 4
|
||
```
|
||
|
||
If you want to bind `external_dbs/bt_toxin` explicitly:
|
||
|
||
```
|
||
bash ../run_digger_pixi.sh ../examples/inputs .fna 4 /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin
|
||
```
|
||
|
||
Outputs land under `Results/` in the working directory.
|
||
|
||
### 参数说明(pixi run_digger_pixi.sh)
|
||
|
||
- `input_dir`: 输入目录(里面放 `.fna` 文件)
|
||
- `scaf_suffix`: 输入文件后缀(例如 `.fna`)
|
||
- `threads`: 线程数(默认 4)
|
||
- `bttoxin_db_dir`: 外部 bt_toxin 数据库路径(可选)
|
||
|
||
### 与 scripts/run_single_fna_pipeline.py 的一致性
|
||
|
||
pixi 脚本调用的 BtToxin_Digger 参数与 `scripts/run_single_fna_pipeline.py`
|
||
里的 docker 调用一致,核心参数对照如下:
|
||
|
||
- `--SeqPath <dir>`:输入目录
|
||
- `--SequenceType nucl`:核酸输入
|
||
- `--Scaf_suffix .fna`:文件后缀
|
||
- `--threads 4`:线程数
|
||
|
||
差异点:
|
||
|
||
- docker 版本会自动绑定 `external_dbs/bt_toxin`(若存在),并把输出整理到
|
||
`runs/<out_root>/digger`;pixi 版本默认在当前工作目录生成 `Results/`。
|
||
- `scripts/run_single_fna_pipeline.py` 还会继续运行 Shotter + report;
|
||
pixi 脚本只执行 BtToxin_Digger 本体。
|
||
|
||
## 4) Outputs and comparison (examples)
|
||
|
||
Inputs copied into this workspace:
|
||
|
||
- `runs/bttoxin_digger_v5_repro/examples/inputs/C15.fna`
|
||
- `runs/bttoxin_digger_v5_repro/examples/inputs/HAN055.fna`
|
||
|
||
- Example pixi runs:
|
||
- `runs/bttoxin_digger_v5_repro/examples/C15_pixi_v5`
|
||
- `runs/bttoxin_digger_v5_repro/examples/HAN055_pixi_v5_clean`
|
||
- Example docker runs:
|
||
- `runs/bttoxin_digger_v5_repro/examples/C15_docker/digger`
|
||
- `runs/bttoxin_digger_v5_repro/examples/HAN055_docker/digger`
|
||
|
||
See `runs/bttoxin_digger_v5_repro/examples/COMPARE_REPORT.md` for the comparison summary.
|
||
|
||
Diff files:
|
||
|
||
- `runs/bttoxin_digger_v5_repro/examples/diffs/C15_docker_vs_pixi_v5.diff`
|
||
- `runs/bttoxin_digger_v5_repro/examples/diffs/HAN055_docker_vs_pixi_v5_clean.diff`
|
||
|
||
## 5) External DB update (v5)
|
||
|
||
When `external_dbs/bt_toxin` is updated from the BtToxin_Digger repo, the BLAST
|
||
database is v5, which requires BLAST >= 2.10.0. That is why this pixi
|
||
environment upgrades BLAST to 2.16.0.
|
||
|
||
After updating `external_dbs/bt_toxin`, ensure the pixi environment still points
|
||
to that directory (see Section 2). With BLAST 2.16.0, no re-index is needed
|
||
because the upstream repo already ships v5 indices. If you downgrade BLAST to
|
||
2.7, rebuild a v4 DB (Section 2).
|
||
|
||
### 更新步骤
|
||
|
||
```bash
|
||
mkdir -p external_dbs
|
||
rm -rf external_dbs/bt_toxin tmp_bttoxin_repo
|
||
|
||
git clone --filter=blob:none --no-checkout https://github.com/liaochenlanruo/BtToxin_Digger.git tmp_bttoxin_repo
|
||
cd tmp_bttoxin_repo
|
||
|
||
git sparse-checkout init --cone
|
||
git sparse-checkout set BTTCMP_db/bt_toxin
|
||
git checkout master
|
||
|
||
# 把目录拷贝到你的项目 external_dbs 下
|
||
cd ..
|
||
cp -a tmp_bttoxin_repo/BTTCMP_db/bt_toxin external_dbs/bt_toxin
|
||
|
||
# 清理临时 repo
|
||
rm -rf tmp_bttoxin_repo
|
||
```
|
||
|
||
### 验证数据库绑定
|
||
|
||
```bash
|
||
# 检查数据库文件是否完整
|
||
ls -lh external_dbs/bt_toxin/db/
|
||
|
||
# 验证容器能正确访问绑定的数据库
|
||
docker run --rm \
|
||
-v "$(pwd)/external_dbs/bt_toxin:/usr/local/bin/BTTCMP_db/bt_toxin:ro" \
|
||
quay.io/biocontainers/bttoxin_digger:1.0.10--hdfd78af_0 \
|
||
bash -lc 'ls -lh /usr/local/bin/BTTCMP_db/bt_toxin/db | head'
|
||
```
|
||
|
||
输出应显示 `.pin/.psq/.phr` 等文件,且时间戳/大小与宿主机一致,说明绑定成功。
|
||
|
||
### 使用外部数据库运行 Pipeline
|
||
|
||
脚本会自动检测 `external_dbs/bt_toxin` 目录,若存在则自动绑定:
|
||
|
||
```bash
|
||
# 自动使用 external_dbs/bt_toxin(推荐)
|
||
uv run python scripts/run_single_fna_pipeline.py --fna tests/test_data/HAN055.fna
|
||
|
||
# 或手动指定数据库路径
|
||
uv run python scripts/run_single_fna_pipeline.py \
|
||
--fna tests/test_data/HAN055.fna \
|
||
--bttoxin_db_dir /path/to/custom/bt_toxin
|
||
```
|
||
|
||
### 注意事项
|
||
|
||
- `db/` 目录是必需的:运行时 BLAST 只读取 `db/` 下的索引文件
|
||
- `seq/` 目录是可选的:仅用于留档或重新生成索引
|
||
- 绑定模式为只读 (`ro`):防止容器意外修改宿主机数据库
|
||
- 不需要重新 index:GitHub 仓库已包含预构建的 BLAST 索引
|
||
|
||
## 6) Repository layout
|
||
|
||
```
|
||
runs/bttoxin_digger_v5_repro/
|
||
├─ .pixi/ # pixi environment cache
|
||
├─ pixi.toml # environment definition (bttoxin_digger + blast)
|
||
├─ pixi.lock # resolved environment
|
||
├─ run_digger_pixi.sh # wrapper to run BtToxin_Digger in this env
|
||
├─ README.md
|
||
└─ examples/
|
||
├─ inputs/ # copied test inputs (C15.fna, HAN055.fna)
|
||
├─ C15_pixi_v5/ # pixi run output (example)
|
||
├─ HAN055_pixi_v5_clean/ # pixi run output (example)
|
||
├─ C15_docker/ # docker output copy (baseline)
|
||
├─ HAN055_docker/ # docker output copy (baseline)
|
||
├─ diffs/ # docker vs pixi diffs
|
||
└─ COMPARE_REPORT.md
|
||
```
|