Files
bttoxin-pipeline/bttoxin_digger_v5_repro/README.md
zly fe353fc0bc chore: 初始版本提交 - 简化架构 + 轮询改造
- 移除 Motia Streams 实时通信,改用 3 秒轮询
- 简化前端代码,移除冗余组件
- 简化后端架构,准备 FastAPI 重构
- 更新 pixi.toml 环境配置
- 保留 bttoxin_digger_v5_repro 作为参考文档

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-13 16:50:09 +08:00

256 lines
9.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# BtToxin_Digger (pixi) reproduction
This repo is a **reproducible runtime environment + example outputs** for
BtToxin_Digger 1.0.10 with **BLAST v5 database compatibility**. It is **not**
an official fork or a new BtToxin_Digger release.
## License / Citation / Disclaimer
- **BtToxin_Digger** is developed by its original authors; cite the upstream
publication if you use it in research.
- **This repository** only provides an environment wrapper (pixi) and example
runs for reproducibility; it does not modify BtToxin_Digger source code.
- **Disclaimer**: This is an independent, community-maintained setup and is
not endorsed by the upstream authors.
This directory reproduces the BtToxin_Digger environment from
`quay.io/biocontainers/bttoxin_digger:1.0.10--hdfd78af_0` using pixi so the
`scripts/run_single_fna_pipeline.py` digger step can be run without Docker.
## 1) Environment definition (vs docker image)
- `pixi.toml` keeps `bttoxin_digger=1.0.10` + `perl=5.26.2` (legacy stack) while
upgrading `blast` to a v5-capable release for BLASTDB v5.
- Changes relative to `quay.io/biocontainers/bttoxin_digger:1.0.10--hdfd78af_0`:
- BLAST+ upgraded from 2.12.0 to 2.16.0 (required to read v5 databases).
- Explicitly pinned `perl-file-tee==0.07` and `perl-list-util==1.38`.
- `channel-priority = "disabled"` to allow mixing bioconda/conda-forge and
the legacy label for perl compatibility.
Create the environment:
```
cd /home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro
pixi install
```
## 2) Database wiring (BLAST v4 vs v5)
The external BTTCMP database under `external_dbs/bt_toxin` ships with a BLAST
v5 index (built by newer BLAST+). If you run with BLAST 2.7, you must rebuild
v4 databases; with BLAST >= 2.10, you can use the v5 database directly.
### Recommended: use the shared `external_dbs` (no copy)
Keep a single source of truth and link it into the pixi environment:
```
ENV_BIN=/home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro/.pixi/envs/default/bin
ln -sfn /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin \
"$ENV_BIN/BTTCMP_db/bt_toxin"
```
This avoids duplicating a large database inside the repo.
### Optional: freeze a snapshot inside this repo
If you want this repo to be self-contained, copy a snapshot and point the
environment at it (note: consider Git LFS if you intend to push it):
```
SNAPSHOT=/home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro/external_dbs_snapshot
mkdir -p "$SNAPSHOT"
cp -a /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin "$SNAPSHOT/"
ln -sfn "$SNAPSHOT/bt_toxin" "$ENV_BIN/BTTCMP_db/bt_toxin"
```
Rebuild `bt_toxin` using the external FASTA:
```
ENV_BIN=/home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro/.pixi/envs/default/bin
V4_DB=/home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro/bt_toxin_v4
mkdir -p "$V4_DB"
cp -a /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin/db "$V4_DB/"
ln -sfn /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin/seq "$V4_DB/seq"
"$ENV_BIN/makeblastdb" \
-in /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin/seq/bt_toxin20251104.fas \
-dbtype prot \
-out "$V4_DB/db/bt_toxin" \
-parse_seqids
ln -sfn "$V4_DB" "$ENV_BIN/BTTCMP_db/bt_toxin"
```
For BLAST v5 (current pixi.toml), point back to the external DB:
```
ln -sfn /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin \
"$ENV_BIN/BTTCMP_db/bt_toxin"
```
Rebuild the negative-set (back) database bundled with BtToxin_Digger:
```
"$ENV_BIN/makeblastdb" \
-in "$ENV_BIN/BTTCMP_db/back/seq/negative_set-20210607" \
-dbtype prot \
-out "$ENV_BIN/BTTCMP_db/back/db/back" \
-parse_seqids
```
## 3) Run BtToxin_Digger (assembled genome)
`run_digger_pixi.sh` sets `RATTLER_CACHE_DIR` inside this directory so pixi can
write its cache in the workspace (the default `~/.cache` path is blocked by the
sandbox).
Example for a single `.fna` (use a clean working directory):
```
mkdir -p /home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro/work/C15_pixi_run_v5
cd /home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro/work/C15_pixi_run_v5
bash ../run_digger_pixi.sh ../examples/inputs .fna 4
```
If you want to bind `external_dbs/bt_toxin` explicitly:
```
bash ../run_digger_pixi.sh ../examples/inputs .fna 4 /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin
```
Outputs land under `Results/` in the working directory.
### 参数说明pixi run_digger_pixi.sh
- `input_dir`: 输入目录(里面放 `.fna` 文件)
- `scaf_suffix`: 输入文件后缀(例如 `.fna`
- `threads`: 线程数(默认 4
- `bttoxin_db_dir`: 外部 bt_toxin 数据库路径(可选)
### 与 scripts/run_single_fna_pipeline.py 的一致性
pixi 脚本调用的 BtToxin_Digger 参数与 `scripts/run_single_fna_pipeline.py`
里的 docker 调用一致,核心参数对照如下:
- `--SeqPath <dir>`:输入目录
- `--SequenceType nucl`:核酸输入
- `--Scaf_suffix .fna`:文件后缀
- `--threads 4`:线程数
差异点:
- docker 版本会自动绑定 `external_dbs/bt_toxin`(若存在),并把输出整理到
`runs/<out_root>/digger`pixi 版本默认在当前工作目录生成 `Results/`
- `scripts/run_single_fna_pipeline.py` 还会继续运行 Shotter + report
pixi 脚本只执行 BtToxin_Digger 本体。
## 4) Outputs and comparison (examples)
Inputs copied into this workspace:
- `runs/bttoxin_digger_v5_repro/examples/inputs/C15.fna`
- `runs/bttoxin_digger_v5_repro/examples/inputs/HAN055.fna`
- Example pixi runs:
- `runs/bttoxin_digger_v5_repro/examples/C15_pixi_v5`
- `runs/bttoxin_digger_v5_repro/examples/HAN055_pixi_v5_clean`
- Example docker runs:
- `runs/bttoxin_digger_v5_repro/examples/C15_docker/digger`
- `runs/bttoxin_digger_v5_repro/examples/HAN055_docker/digger`
See `runs/bttoxin_digger_v5_repro/examples/COMPARE_REPORT.md` for the comparison summary.
Diff files:
- `runs/bttoxin_digger_v5_repro/examples/diffs/C15_docker_vs_pixi_v5.diff`
- `runs/bttoxin_digger_v5_repro/examples/diffs/HAN055_docker_vs_pixi_v5_clean.diff`
## 5) External DB update (v5)
When `external_dbs/bt_toxin` is updated from the BtToxin_Digger repo, the BLAST
database is v5, which requires BLAST >= 2.10.0. That is why this pixi
environment upgrades BLAST to 2.16.0.
After updating `external_dbs/bt_toxin`, ensure the pixi environment still points
to that directory (see Section 2). With BLAST 2.16.0, no re-index is needed
because the upstream repo already ships v5 indices. If you downgrade BLAST to
2.7, rebuild a v4 DB (Section 2).
### 更新步骤
```bash
mkdir -p external_dbs
rm -rf external_dbs/bt_toxin tmp_bttoxin_repo
git clone --filter=blob:none --no-checkout https://github.com/liaochenlanruo/BtToxin_Digger.git tmp_bttoxin_repo
cd tmp_bttoxin_repo
git sparse-checkout init --cone
git sparse-checkout set BTTCMP_db/bt_toxin
git checkout master
# 把目录拷贝到你的项目 external_dbs 下
cd ..
cp -a tmp_bttoxin_repo/BTTCMP_db/bt_toxin external_dbs/bt_toxin
# 清理临时 repo
rm -rf tmp_bttoxin_repo
```
### 验证数据库绑定
```bash
# 检查数据库文件是否完整
ls -lh external_dbs/bt_toxin/db/
# 验证容器能正确访问绑定的数据库
docker run --rm \
-v "$(pwd)/external_dbs/bt_toxin:/usr/local/bin/BTTCMP_db/bt_toxin:ro" \
quay.io/biocontainers/bttoxin_digger:1.0.10--hdfd78af_0 \
bash -lc 'ls -lh /usr/local/bin/BTTCMP_db/bt_toxin/db | head'
```
输出应显示 `.pin/.psq/.phr` 等文件,且时间戳/大小与宿主机一致,说明绑定成功。
### 使用外部数据库运行 Pipeline
脚本会自动检测 `external_dbs/bt_toxin` 目录,若存在则自动绑定:
```bash
# 自动使用 external_dbs/bt_toxin推荐
uv run python scripts/run_single_fna_pipeline.py --fna tests/test_data/HAN055.fna
# 或手动指定数据库路径
uv run python scripts/run_single_fna_pipeline.py \
--fna tests/test_data/HAN055.fna \
--bttoxin_db_dir /path/to/custom/bt_toxin
```
### 注意事项
- `db/` 目录是必需的:运行时 BLAST 只读取 `db/` 下的索引文件
- `seq/` 目录是可选的:仅用于留档或重新生成索引
- 绑定模式为只读 (`ro`):防止容器意外修改宿主机数据库
- 不需要重新 indexGitHub 仓库已包含预构建的 BLAST 索引
## 6) Repository layout
```
runs/bttoxin_digger_v5_repro/
├─ .pixi/ # pixi environment cache
├─ pixi.toml # environment definition (bttoxin_digger + blast)
├─ pixi.lock # resolved environment
├─ run_digger_pixi.sh # wrapper to run BtToxin_Digger in this env
├─ README.md
└─ examples/
├─ inputs/ # copied test inputs (C15.fna, HAN055.fna)
├─ C15_pixi_v5/ # pixi run output (example)
├─ HAN055_pixi_v5_clean/ # pixi run output (example)
├─ C15_docker/ # docker output copy (baseline)
├─ HAN055_docker/ # docker output copy (baseline)
├─ diffs/ # docker vs pixi diffs
└─ COMPARE_REPORT.md
```