feat(digger): containerize BtToxin_Digger with v5 database integration
- Added Dockerfile and docker-compose.yml for BtToxin_Digger - Integrated external v5 BLAST database into the container image - Updated main docker-compose.yml to include the digger service - Updated documentation with database update instructions
This commit is contained in:
@@ -1,255 +1,141 @@
|
||||
# BtToxin_Digger (pixi) reproduction
|
||||
# BtToxin_Digger (pixi) reproduction & Docker Image
|
||||
|
||||
This repo is a **reproducible runtime environment + example outputs** for
|
||||
BtToxin_Digger 1.0.10 with **BLAST v5 database compatibility**. It is **not**
|
||||
an official fork or a new BtToxin_Digger release.
|
||||
This repo is a **reproducible runtime environment** for BtToxin_Digger 1.0.10, packaged as a Docker image based on `ghcr.io/prefix-dev/pixi`.
|
||||
|
||||
It includes:
|
||||
1. **BtToxin_Digger 1.0.10** (installed via Pixi)
|
||||
2. **BLAST+ 2.16.0** (compatible with v5 databases)
|
||||
3. **Pre-bundled BtToxin Database** (baked into the image)
|
||||
|
||||
## License / Citation / Disclaimer
|
||||
|
||||
- **BtToxin_Digger** is developed by its original authors; cite the upstream
|
||||
publication if you use it in research.
|
||||
- **This repository** only provides an environment wrapper (pixi) and example
|
||||
runs for reproducibility; it does not modify BtToxin_Digger source code.
|
||||
- **Disclaimer**: This is an independent, community-maintained setup and is
|
||||
not endorsed by the upstream authors.
|
||||
- **BtToxin_Digger** is developed by its original authors; cite the upstream publication if you use it in research.
|
||||
- **This repository** only provides an environment wrapper (pixi/docker); it does not modify BtToxin_Digger source code.
|
||||
|
||||
This directory reproduces the BtToxin_Digger environment from
|
||||
`quay.io/biocontainers/bttoxin_digger:1.0.10--hdfd78af_0` using pixi so the
|
||||
`scripts/run_single_fna_pipeline.py` digger step can be run without Docker.
|
||||
## 1. Quick Start with Docker
|
||||
|
||||
## 1) Environment definition (vs docker image)
|
||||
The easiest way to run this is using the included `docker-compose.yml` or the global project configuration.
|
||||
|
||||
- `pixi.toml` keeps `bttoxin_digger=1.0.10` + `perl=5.26.2` (legacy stack) while
|
||||
upgrading `blast` to a v5-capable release for BLASTDB v5.
|
||||
- Changes relative to `quay.io/biocontainers/bttoxin_digger:1.0.10--hdfd78af_0`:
|
||||
- BLAST+ upgraded from 2.12.0 to 2.16.0 (required to read v5 databases).
|
||||
- Explicitly pinned `perl-file-tee==0.07` and `perl-list-util==1.38`.
|
||||
- `channel-priority = "disabled"` to allow mixing bioconda/conda-forge and
|
||||
the legacy label for perl compatibility.
|
||||
Create the environment:
|
||||
### Build the Image
|
||||
|
||||
```bash
|
||||
# In this directory
|
||||
docker compose build
|
||||
```
|
||||
cd /home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro
|
||||
|
||||
### Run Analysis
|
||||
|
||||
Place your input `.fna` files in `examples/inputs` (or mount your own directory), then run:
|
||||
|
||||
```bash
|
||||
# Run help
|
||||
docker compose run --rm digger-repro pixi run BtToxin_Digger --help
|
||||
|
||||
# Run analysis on a specific file
|
||||
# Note: Input path must match the internal mount point (/app/jobs)
|
||||
docker compose run --rm digger-repro pixi run BtToxin_Digger \
|
||||
--SeqPath /app/jobs \
|
||||
--Scaf_suffix .fna \
|
||||
--threads 4
|
||||
```
|
||||
|
||||
### Directory Mounting
|
||||
|
||||
- `/app/jobs`: Mount your input sequence files here.
|
||||
- `/app/data`: Mount your desired output directory here (if using absolute paths in arguments).
|
||||
|
||||
## 2. Docker Image Construction
|
||||
|
||||
The image is built using `docker/Dockerfile`.
|
||||
|
||||
### Base Image
|
||||
Uses `ghcr.io/prefix-dev/pixi:latest` to ensure a consistent conda-compatible environment.
|
||||
|
||||
### Database Integration
|
||||
The external database (`external_dbs/bt_toxin`) is **copied into the image** during build time.
|
||||
Target location: `/app/.pixi/envs/default/bin/BTTCMP_db/bt_toxin`
|
||||
|
||||
This replaces the default database shipped with the bioconda package, ensuring:
|
||||
1. Latest toxin definitions are used.
|
||||
2. BLAST v5 indices are compatible with the installed BLAST+ 2.16.0.
|
||||
|
||||
### Environment Definition (`pixi.toml`)
|
||||
- `bttoxin_digger = "==1.0.10"`
|
||||
- `perl = "==5.26.2"` (Legacy requirement)
|
||||
- `blast = "==2.16.0"` (Upgraded for v5 DB support)
|
||||
- `channel-priority = "disabled"`
|
||||
|
||||
## 3. Development / Manual Usage
|
||||
|
||||
If you want to run without Docker using local Pixi:
|
||||
|
||||
```bash
|
||||
# Install environment
|
||||
pixi install
|
||||
|
||||
# Link the database (required manually if not using Docker)
|
||||
# The Dockerfile does this automatically by copying files.
|
||||
ENV_BIN=.pixi/envs/default/bin
|
||||
rm -rf "$ENV_BIN/BTTCMP_db/bt_toxin"
|
||||
ln -sfn $(pwd)/external_dbs/bt_toxin "$ENV_BIN/BTTCMP_db/bt_toxin"
|
||||
|
||||
# Run
|
||||
pixi run BtToxin_Digger --help
|
||||
```
|
||||
|
||||
## 2) Database wiring (BLAST v4 vs v5)
|
||||
|
||||
The external BTTCMP database under `external_dbs/bt_toxin` ships with a BLAST
|
||||
v5 index (built by newer BLAST+). If you run with BLAST 2.7, you must rebuild
|
||||
v4 databases; with BLAST >= 2.10, you can use the v5 database directly.
|
||||
|
||||
### Recommended: use the shared `external_dbs` (no copy)
|
||||
|
||||
Keep a single source of truth and link it into the pixi environment:
|
||||
## 4. Repository Layout
|
||||
|
||||
```
|
||||
ENV_BIN=/home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro/.pixi/envs/default/bin
|
||||
ln -sfn /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin \
|
||||
"$ENV_BIN/BTTCMP_db/bt_toxin"
|
||||
.
|
||||
├── docker/
|
||||
│ └── Dockerfile # Docker build definition
|
||||
├── docker-compose.yml # Local test orchestration
|
||||
├── external_dbs/ # Database source (copied into image)
|
||||
│ └── bt_toxin/ # The actual database files
|
||||
├── pixi.toml # Environment dependencies
|
||||
├── pixi.lock # Exact version lock
|
||||
└── examples/ # Test inputs and outputs
|
||||
```
|
||||
|
||||
This avoids duplicating a large database inside the repo.
|
||||
## 5. Updating the Database (Important for Future Updates)
|
||||
|
||||
### Optional: freeze a snapshot inside this repo
|
||||
The database consists of two parts in `external_dbs/bt_toxin`:
|
||||
1. **`seq/` Directory**: Contains the raw FASTA sequence files (e.g., `bt_toxin20251104.fas`).
|
||||
2. **`db/` Directory**: Contains the BLAST indices (`.phr`, `.pin`, `.psq`) generated from the sequences.
|
||||
|
||||
If you want this repo to be self-contained, copy a snapshot and point the
|
||||
environment at it (note: consider Git LFS if you intend to push it):
|
||||
**Relationship**: The files in `db/` are **generated from** the FASTA files in `seq/` using `makeblastdb`. The filename of the source FASTA (e.g., `bt_toxin20251104.fas`) is embedded in the `db` files metadata.
|
||||
|
||||
```
|
||||
SNAPSHOT=/home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro/external_dbs_snapshot
|
||||
mkdir -p "$SNAPSHOT"
|
||||
cp -a /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin "$SNAPSHOT/"
|
||||
ln -sfn "$SNAPSHOT/bt_toxin" "$ENV_BIN/BTTCMP_db/bt_toxin"
|
||||
```
|
||||
### How to Update (e.g., for 2026/2027 data)
|
||||
|
||||
Rebuild `bt_toxin` using the external FASTA:
|
||||
If a new database version is released (e.g., from https://github.com/liaochenlanruo/BtToxin_Digger), follow these steps:
|
||||
|
||||
```
|
||||
ENV_BIN=/home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro/.pixi/envs/default/bin
|
||||
V4_DB=/home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro/bt_toxin_v4
|
||||
1. **Download New Sequences**:
|
||||
Place the new FASTA file (e.g., `bt_toxin2026xxxx.fas`) into `external_dbs/bt_toxin/seq/`.
|
||||
|
||||
mkdir -p "$V4_DB"
|
||||
cp -a /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin/db "$V4_DB/"
|
||||
ln -sfn /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin/seq "$V4_DB/seq"
|
||||
2. **Generate New Indices (Critical Step)**:
|
||||
You must regenerate the indices in `external_dbs/bt_toxin/db/`. You can use a temporary container or local BLAST+ to do this.
|
||||
|
||||
"$ENV_BIN/makeblastdb" \
|
||||
-in /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin/seq/bt_toxin20251104.fas \
|
||||
-dbtype prot \
|
||||
-out "$V4_DB/db/bt_toxin" \
|
||||
-parse_seqids
|
||||
```bash
|
||||
# Example using the local pixi environment (if installed)
|
||||
# Or use a container with blast installed
|
||||
makeblastdb \
|
||||
-in external_dbs/bt_toxin/seq/bt_toxin2026xxxx.fas \
|
||||
-dbtype prot \
|
||||
-out external_dbs/bt_toxin/db/bt_toxin \
|
||||
-parse_seqids
|
||||
```
|
||||
|
||||
ln -sfn "$V4_DB" "$ENV_BIN/BTTCMP_db/bt_toxin"
|
||||
```
|
||||
*Note: The `-out` parameter must end with `bt_toxin` to match what the tool expects.*
|
||||
|
||||
For BLAST v5 (current pixi.toml), point back to the external DB:
|
||||
3. **Rebuild Docker Image**:
|
||||
The Dockerfile copies `external_dbs/bt_toxin` into the image. You must rebuild it to include the changes.
|
||||
|
||||
```
|
||||
ln -sfn /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin \
|
||||
"$ENV_BIN/BTTCMP_db/bt_toxin"
|
||||
```
|
||||
```bash
|
||||
docker compose build --no-cache
|
||||
```
|
||||
|
||||
Rebuild the negative-set (back) database bundled with BtToxin_Digger:
|
||||
|
||||
```
|
||||
"$ENV_BIN/makeblastdb" \
|
||||
-in "$ENV_BIN/BTTCMP_db/back/seq/negative_set-20210607" \
|
||||
-dbtype prot \
|
||||
-out "$ENV_BIN/BTTCMP_db/back/db/back" \
|
||||
-parse_seqids
|
||||
```
|
||||
|
||||
## 3) Run BtToxin_Digger (assembled genome)
|
||||
|
||||
`run_digger_pixi.sh` sets `RATTLER_CACHE_DIR` inside this directory so pixi can
|
||||
write its cache in the workspace (the default `~/.cache` path is blocked by the
|
||||
sandbox).
|
||||
|
||||
Example for a single `.fna` (use a clean working directory):
|
||||
|
||||
```
|
||||
mkdir -p /home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro/work/C15_pixi_run_v5
|
||||
cd /home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro/work/C15_pixi_run_v5
|
||||
|
||||
bash ../run_digger_pixi.sh ../examples/inputs .fna 4
|
||||
```
|
||||
|
||||
If you want to bind `external_dbs/bt_toxin` explicitly:
|
||||
|
||||
```
|
||||
bash ../run_digger_pixi.sh ../examples/inputs .fna 4 /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin
|
||||
```
|
||||
|
||||
Outputs land under `Results/` in the working directory.
|
||||
|
||||
### 参数说明(pixi run_digger_pixi.sh)
|
||||
|
||||
- `input_dir`: 输入目录(里面放 `.fna` 文件)
|
||||
- `scaf_suffix`: 输入文件后缀(例如 `.fna`)
|
||||
- `threads`: 线程数(默认 4)
|
||||
- `bttoxin_db_dir`: 外部 bt_toxin 数据库路径(可选)
|
||||
|
||||
### 与 scripts/run_single_fna_pipeline.py 的一致性
|
||||
|
||||
pixi 脚本调用的 BtToxin_Digger 参数与 `scripts/run_single_fna_pipeline.py`
|
||||
里的 docker 调用一致,核心参数对照如下:
|
||||
|
||||
- `--SeqPath <dir>`:输入目录
|
||||
- `--SequenceType nucl`:核酸输入
|
||||
- `--Scaf_suffix .fna`:文件后缀
|
||||
- `--threads 4`:线程数
|
||||
|
||||
差异点:
|
||||
|
||||
- docker 版本会自动绑定 `external_dbs/bt_toxin`(若存在),并把输出整理到
|
||||
`runs/<out_root>/digger`;pixi 版本默认在当前工作目录生成 `Results/`。
|
||||
- `scripts/run_single_fna_pipeline.py` 还会继续运行 Shotter + report;
|
||||
pixi 脚本只执行 BtToxin_Digger 本体。
|
||||
|
||||
## 4) Outputs and comparison (examples)
|
||||
|
||||
Inputs copied into this workspace:
|
||||
|
||||
- `runs/bttoxin_digger_v5_repro/examples/inputs/C15.fna`
|
||||
- `runs/bttoxin_digger_v5_repro/examples/inputs/HAN055.fna`
|
||||
|
||||
- Example pixi runs:
|
||||
- `runs/bttoxin_digger_v5_repro/examples/C15_pixi_v5`
|
||||
- `runs/bttoxin_digger_v5_repro/examples/HAN055_pixi_v5_clean`
|
||||
- Example docker runs:
|
||||
- `runs/bttoxin_digger_v5_repro/examples/C15_docker/digger`
|
||||
- `runs/bttoxin_digger_v5_repro/examples/HAN055_docker/digger`
|
||||
|
||||
See `runs/bttoxin_digger_v5_repro/examples/COMPARE_REPORT.md` for the comparison summary.
|
||||
|
||||
Diff files:
|
||||
|
||||
- `runs/bttoxin_digger_v5_repro/examples/diffs/C15_docker_vs_pixi_v5.diff`
|
||||
- `runs/bttoxin_digger_v5_repro/examples/diffs/HAN055_docker_vs_pixi_v5_clean.diff`
|
||||
|
||||
## 5) External DB update (v5)
|
||||
|
||||
When `external_dbs/bt_toxin` is updated from the BtToxin_Digger repo, the BLAST
|
||||
database is v5, which requires BLAST >= 2.10.0. That is why this pixi
|
||||
environment upgrades BLAST to 2.16.0.
|
||||
|
||||
After updating `external_dbs/bt_toxin`, ensure the pixi environment still points
|
||||
to that directory (see Section 2). With BLAST 2.16.0, no re-index is needed
|
||||
because the upstream repo already ships v5 indices. If you downgrade BLAST to
|
||||
2.7, rebuild a v4 DB (Section 2).
|
||||
|
||||
### 更新步骤
|
||||
|
||||
```bash
|
||||
mkdir -p external_dbs
|
||||
rm -rf external_dbs/bt_toxin tmp_bttoxin_repo
|
||||
|
||||
git clone --filter=blob:none --no-checkout https://github.com/liaochenlanruo/BtToxin_Digger.git tmp_bttoxin_repo
|
||||
cd tmp_bttoxin_repo
|
||||
|
||||
git sparse-checkout init --cone
|
||||
git sparse-checkout set BTTCMP_db/bt_toxin
|
||||
git checkout master
|
||||
|
||||
# 把目录拷贝到你的项目 external_dbs 下
|
||||
cd ..
|
||||
cp -a tmp_bttoxin_repo/BTTCMP_db/bt_toxin external_dbs/bt_toxin
|
||||
|
||||
# 清理临时 repo
|
||||
rm -rf tmp_bttoxin_repo
|
||||
```
|
||||
|
||||
### 验证数据库绑定
|
||||
|
||||
```bash
|
||||
# 检查数据库文件是否完整
|
||||
ls -lh external_dbs/bt_toxin/db/
|
||||
|
||||
# 验证容器能正确访问绑定的数据库
|
||||
docker run --rm \
|
||||
-v "$(pwd)/external_dbs/bt_toxin:/usr/local/bin/BTTCMP_db/bt_toxin:ro" \
|
||||
quay.io/biocontainers/bttoxin_digger:1.0.10--hdfd78af_0 \
|
||||
bash -lc 'ls -lh /usr/local/bin/BTTCMP_db/bt_toxin/db | head'
|
||||
```
|
||||
|
||||
输出应显示 `.pin/.psq/.phr` 等文件,且时间戳/大小与宿主机一致,说明绑定成功。
|
||||
|
||||
### 使用外部数据库运行 Pipeline
|
||||
|
||||
脚本会自动检测 `external_dbs/bt_toxin` 目录,若存在则自动绑定:
|
||||
|
||||
```bash
|
||||
# 自动使用 external_dbs/bt_toxin(推荐)
|
||||
uv run python scripts/run_single_fna_pipeline.py --fna tests/test_data/HAN055.fna
|
||||
|
||||
# 或手动指定数据库路径
|
||||
uv run python scripts/run_single_fna_pipeline.py \
|
||||
--fna tests/test_data/HAN055.fna \
|
||||
--bttoxin_db_dir /path/to/custom/bt_toxin
|
||||
```
|
||||
|
||||
### 注意事项
|
||||
|
||||
- `db/` 目录是必需的:运行时 BLAST 只读取 `db/` 下的索引文件
|
||||
- `seq/` 目录是可选的:仅用于留档或重新生成索引
|
||||
- 绑定模式为只读 (`ro`):防止容器意外修改宿主机数据库
|
||||
- 不需要重新 index:GitHub 仓库已包含预构建的 BLAST 索引
|
||||
|
||||
## 6) Repository layout
|
||||
|
||||
```
|
||||
runs/bttoxin_digger_v5_repro/
|
||||
├─ .pixi/ # pixi environment cache
|
||||
├─ pixi.toml # environment definition (bttoxin_digger + blast)
|
||||
├─ pixi.lock # resolved environment
|
||||
├─ run_digger_pixi.sh # wrapper to run BtToxin_Digger in this env
|
||||
├─ README.md
|
||||
└─ examples/
|
||||
├─ inputs/ # copied test inputs (C15.fna, HAN055.fna)
|
||||
├─ C15_pixi_v5/ # pixi run output (example)
|
||||
├─ HAN055_pixi_v5_clean/ # pixi run output (example)
|
||||
├─ C15_docker/ # docker output copy (baseline)
|
||||
├─ HAN055_docker/ # docker output copy (baseline)
|
||||
├─ diffs/ # docker vs pixi diffs
|
||||
└─ COMPARE_REPORT.md
|
||||
```
|
||||
4. **Verify**:
|
||||
Check the database version inside the new container:
|
||||
```bash
|
||||
docker compose run --rm digger-repro pixi run blastdbcmd -db /app/.pixi/envs/default/bin/BTTCMP_db/bt_toxin/db/bt_toxin -info
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user