feat(digger): containerize BtToxin_Digger with v5 database integration

- Added Dockerfile and docker-compose.yml for BtToxin_Digger
- Integrated external v5 BLAST database into the container image
- Updated main docker-compose.yml to include the digger service
- Updated documentation with database update instructions
This commit is contained in:
zly
2026-01-17 12:14:39 +08:00
parent 6f2365981d
commit 700bdb8307
33 changed files with 232973 additions and 75716 deletions

View File

@@ -1,255 +1,141 @@
# BtToxin_Digger (pixi) reproduction
# BtToxin_Digger (pixi) reproduction & Docker Image
This repo is a **reproducible runtime environment + example outputs** for
BtToxin_Digger 1.0.10 with **BLAST v5 database compatibility**. It is **not**
an official fork or a new BtToxin_Digger release.
This repo is a **reproducible runtime environment** for BtToxin_Digger 1.0.10, packaged as a Docker image based on `ghcr.io/prefix-dev/pixi`.
It includes:
1. **BtToxin_Digger 1.0.10** (installed via Pixi)
2. **BLAST+ 2.16.0** (compatible with v5 databases)
3. **Pre-bundled BtToxin Database** (baked into the image)
## License / Citation / Disclaimer
- **BtToxin_Digger** is developed by its original authors; cite the upstream
publication if you use it in research.
- **This repository** only provides an environment wrapper (pixi) and example
runs for reproducibility; it does not modify BtToxin_Digger source code.
- **Disclaimer**: This is an independent, community-maintained setup and is
not endorsed by the upstream authors.
- **BtToxin_Digger** is developed by its original authors; cite the upstream publication if you use it in research.
- **This repository** only provides an environment wrapper (pixi/docker); it does not modify BtToxin_Digger source code.
This directory reproduces the BtToxin_Digger environment from
`quay.io/biocontainers/bttoxin_digger:1.0.10--hdfd78af_0` using pixi so the
`scripts/run_single_fna_pipeline.py` digger step can be run without Docker.
## 1. Quick Start with Docker
## 1) Environment definition (vs docker image)
The easiest way to run this is using the included `docker-compose.yml` or the global project configuration.
- `pixi.toml` keeps `bttoxin_digger=1.0.10` + `perl=5.26.2` (legacy stack) while
upgrading `blast` to a v5-capable release for BLASTDB v5.
- Changes relative to `quay.io/biocontainers/bttoxin_digger:1.0.10--hdfd78af_0`:
- BLAST+ upgraded from 2.12.0 to 2.16.0 (required to read v5 databases).
- Explicitly pinned `perl-file-tee==0.07` and `perl-list-util==1.38`.
- `channel-priority = "disabled"` to allow mixing bioconda/conda-forge and
the legacy label for perl compatibility.
Create the environment:
### Build the Image
```bash
# In this directory
docker compose build
```
cd /home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro
### Run Analysis
Place your input `.fna` files in `examples/inputs` (or mount your own directory), then run:
```bash
# Run help
docker compose run --rm digger-repro pixi run BtToxin_Digger --help
# Run analysis on a specific file
# Note: Input path must match the internal mount point (/app/jobs)
docker compose run --rm digger-repro pixi run BtToxin_Digger \
--SeqPath /app/jobs \
--Scaf_suffix .fna \
--threads 4
```
### Directory Mounting
- `/app/jobs`: Mount your input sequence files here.
- `/app/data`: Mount your desired output directory here (if using absolute paths in arguments).
## 2. Docker Image Construction
The image is built using `docker/Dockerfile`.
### Base Image
Uses `ghcr.io/prefix-dev/pixi:latest` to ensure a consistent conda-compatible environment.
### Database Integration
The external database (`external_dbs/bt_toxin`) is **copied into the image** during build time.
Target location: `/app/.pixi/envs/default/bin/BTTCMP_db/bt_toxin`
This replaces the default database shipped with the bioconda package, ensuring:
1. Latest toxin definitions are used.
2. BLAST v5 indices are compatible with the installed BLAST+ 2.16.0.
### Environment Definition (`pixi.toml`)
- `bttoxin_digger = "==1.0.10"`
- `perl = "==5.26.2"` (Legacy requirement)
- `blast = "==2.16.0"` (Upgraded for v5 DB support)
- `channel-priority = "disabled"`
## 3. Development / Manual Usage
If you want to run without Docker using local Pixi:
```bash
# Install environment
pixi install
# Link the database (required manually if not using Docker)
# The Dockerfile does this automatically by copying files.
ENV_BIN=.pixi/envs/default/bin
rm -rf "$ENV_BIN/BTTCMP_db/bt_toxin"
ln -sfn $(pwd)/external_dbs/bt_toxin "$ENV_BIN/BTTCMP_db/bt_toxin"
# Run
pixi run BtToxin_Digger --help
```
## 2) Database wiring (BLAST v4 vs v5)
The external BTTCMP database under `external_dbs/bt_toxin` ships with a BLAST
v5 index (built by newer BLAST+). If you run with BLAST 2.7, you must rebuild
v4 databases; with BLAST >= 2.10, you can use the v5 database directly.
### Recommended: use the shared `external_dbs` (no copy)
Keep a single source of truth and link it into the pixi environment:
## 4. Repository Layout
```
ENV_BIN=/home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro/.pixi/envs/default/bin
ln -sfn /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin \
"$ENV_BIN/BTTCMP_db/bt_toxin"
.
├── docker/
└── Dockerfile # Docker build definition
├── docker-compose.yml # Local test orchestration
├── external_dbs/ # Database source (copied into image)
│ └── bt_toxin/ # The actual database files
├── pixi.toml # Environment dependencies
├── pixi.lock # Exact version lock
└── examples/ # Test inputs and outputs
```
This avoids duplicating a large database inside the repo.
## 5. Updating the Database (Important for Future Updates)
### Optional: freeze a snapshot inside this repo
The database consists of two parts in `external_dbs/bt_toxin`:
1. **`seq/` Directory**: Contains the raw FASTA sequence files (e.g., `bt_toxin20251104.fas`).
2. **`db/` Directory**: Contains the BLAST indices (`.phr`, `.pin`, `.psq`) generated from the sequences.
If you want this repo to be self-contained, copy a snapshot and point the
environment at it (note: consider Git LFS if you intend to push it):
**Relationship**: The files in `db/` are **generated from** the FASTA files in `seq/` using `makeblastdb`. The filename of the source FASTA (e.g., `bt_toxin20251104.fas`) is embedded in the `db` files metadata.
```
SNAPSHOT=/home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro/external_dbs_snapshot
mkdir -p "$SNAPSHOT"
cp -a /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin "$SNAPSHOT/"
ln -sfn "$SNAPSHOT/bt_toxin" "$ENV_BIN/BTTCMP_db/bt_toxin"
```
### How to Update (e.g., for 2026/2027 data)
Rebuild `bt_toxin` using the external FASTA:
If a new database version is released (e.g., from https://github.com/liaochenlanruo/BtToxin_Digger), follow these steps:
```
ENV_BIN=/home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro/.pixi/envs/default/bin
V4_DB=/home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro/bt_toxin_v4
1. **Download New Sequences**:
Place the new FASTA file (e.g., `bt_toxin2026xxxx.fas`) into `external_dbs/bt_toxin/seq/`.
mkdir -p "$V4_DB"
cp -a /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin/db "$V4_DB/"
ln -sfn /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin/seq "$V4_DB/seq"
2. **Generate New Indices (Critical Step)**:
You must regenerate the indices in `external_dbs/bt_toxin/db/`. You can use a temporary container or local BLAST+ to do this.
"$ENV_BIN/makeblastdb" \
-in /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin/seq/bt_toxin20251104.fas \
```bash
# Example using the local pixi environment (if installed)
# Or use a container with blast installed
makeblastdb \
-in external_dbs/bt_toxin/seq/bt_toxin2026xxxx.fas \
-dbtype prot \
-out "$V4_DB/db/bt_toxin" \
-parse_seqids
ln -sfn "$V4_DB" "$ENV_BIN/BTTCMP_db/bt_toxin"
```
For BLAST v5 (current pixi.toml), point back to the external DB:
```
ln -sfn /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin \
"$ENV_BIN/BTTCMP_db/bt_toxin"
```
Rebuild the negative-set (back) database bundled with BtToxin_Digger:
```
"$ENV_BIN/makeblastdb" \
-in "$ENV_BIN/BTTCMP_db/back/seq/negative_set-20210607" \
-dbtype prot \
-out "$ENV_BIN/BTTCMP_db/back/db/back" \
-out external_dbs/bt_toxin/db/bt_toxin \
-parse_seqids
```
## 3) Run BtToxin_Digger (assembled genome)
*Note: The `-out` parameter must end with `bt_toxin` to match what the tool expects.*
`run_digger_pixi.sh` sets `RATTLER_CACHE_DIR` inside this directory so pixi can
write its cache in the workspace (the default `~/.cache` path is blocked by the
sandbox).
Example for a single `.fna` (use a clean working directory):
```
mkdir -p /home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro/work/C15_pixi_run_v5
cd /home/zly/project/bttoxin-pipeline/runs/bttoxin_digger_v5_repro/work/C15_pixi_run_v5
bash ../run_digger_pixi.sh ../examples/inputs .fna 4
```
If you want to bind `external_dbs/bt_toxin` explicitly:
```
bash ../run_digger_pixi.sh ../examples/inputs .fna 4 /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin
```
Outputs land under `Results/` in the working directory.
### 参数说明pixi run_digger_pixi.sh
- `input_dir`: 输入目录(里面放 `.fna` 文件)
- `scaf_suffix`: 输入文件后缀(例如 `.fna`
- `threads`: 线程数(默认 4
- `bttoxin_db_dir`: 外部 bt_toxin 数据库路径(可选)
### 与 scripts/run_single_fna_pipeline.py 的一致性
pixi 脚本调用的 BtToxin_Digger 参数与 `scripts/run_single_fna_pipeline.py`
里的 docker 调用一致,核心参数对照如下:
- `--SeqPath <dir>`:输入目录
- `--SequenceType nucl`:核酸输入
- `--Scaf_suffix .fna`:文件后缀
- `--threads 4`:线程数
差异点:
- docker 版本会自动绑定 `external_dbs/bt_toxin`(若存在),并把输出整理到
`runs/<out_root>/digger`pixi 版本默认在当前工作目录生成 `Results/`
- `scripts/run_single_fna_pipeline.py` 还会继续运行 Shotter + report
pixi 脚本只执行 BtToxin_Digger 本体。
## 4) Outputs and comparison (examples)
Inputs copied into this workspace:
- `runs/bttoxin_digger_v5_repro/examples/inputs/C15.fna`
- `runs/bttoxin_digger_v5_repro/examples/inputs/HAN055.fna`
- Example pixi runs:
- `runs/bttoxin_digger_v5_repro/examples/C15_pixi_v5`
- `runs/bttoxin_digger_v5_repro/examples/HAN055_pixi_v5_clean`
- Example docker runs:
- `runs/bttoxin_digger_v5_repro/examples/C15_docker/digger`
- `runs/bttoxin_digger_v5_repro/examples/HAN055_docker/digger`
See `runs/bttoxin_digger_v5_repro/examples/COMPARE_REPORT.md` for the comparison summary.
Diff files:
- `runs/bttoxin_digger_v5_repro/examples/diffs/C15_docker_vs_pixi_v5.diff`
- `runs/bttoxin_digger_v5_repro/examples/diffs/HAN055_docker_vs_pixi_v5_clean.diff`
## 5) External DB update (v5)
When `external_dbs/bt_toxin` is updated from the BtToxin_Digger repo, the BLAST
database is v5, which requires BLAST >= 2.10.0. That is why this pixi
environment upgrades BLAST to 2.16.0.
After updating `external_dbs/bt_toxin`, ensure the pixi environment still points
to that directory (see Section 2). With BLAST 2.16.0, no re-index is needed
because the upstream repo already ships v5 indices. If you downgrade BLAST to
2.7, rebuild a v4 DB (Section 2).
### 更新步骤
3. **Rebuild Docker Image**:
The Dockerfile copies `external_dbs/bt_toxin` into the image. You must rebuild it to include the changes.
```bash
mkdir -p external_dbs
rm -rf external_dbs/bt_toxin tmp_bttoxin_repo
git clone --filter=blob:none --no-checkout https://github.com/liaochenlanruo/BtToxin_Digger.git tmp_bttoxin_repo
cd tmp_bttoxin_repo
git sparse-checkout init --cone
git sparse-checkout set BTTCMP_db/bt_toxin
git checkout master
# 把目录拷贝到你的项目 external_dbs 下
cd ..
cp -a tmp_bttoxin_repo/BTTCMP_db/bt_toxin external_dbs/bt_toxin
# 清理临时 repo
rm -rf tmp_bttoxin_repo
docker compose build --no-cache
```
### 验证数据库绑定
4. **Verify**:
Check the database version inside the new container:
```bash
# 检查数据库文件是否完整
ls -lh external_dbs/bt_toxin/db/
# 验证容器能正确访问绑定的数据库
docker run --rm \
-v "$(pwd)/external_dbs/bt_toxin:/usr/local/bin/BTTCMP_db/bt_toxin:ro" \
quay.io/biocontainers/bttoxin_digger:1.0.10--hdfd78af_0 \
bash -lc 'ls -lh /usr/local/bin/BTTCMP_db/bt_toxin/db | head'
```
输出应显示 `.pin/.psq/.phr` 等文件,且时间戳/大小与宿主机一致,说明绑定成功。
### 使用外部数据库运行 Pipeline
脚本会自动检测 `external_dbs/bt_toxin` 目录,若存在则自动绑定:
```bash
# 自动使用 external_dbs/bt_toxin推荐
uv run python scripts/run_single_fna_pipeline.py --fna tests/test_data/HAN055.fna
# 或手动指定数据库路径
uv run python scripts/run_single_fna_pipeline.py \
--fna tests/test_data/HAN055.fna \
--bttoxin_db_dir /path/to/custom/bt_toxin
```
### 注意事项
- `db/` 目录是必需的:运行时 BLAST 只读取 `db/` 下的索引文件
- `seq/` 目录是可选的:仅用于留档或重新生成索引
- 绑定模式为只读 (`ro`):防止容器意外修改宿主机数据库
- 不需要重新 indexGitHub 仓库已包含预构建的 BLAST 索引
## 6) Repository layout
```
runs/bttoxin_digger_v5_repro/
├─ .pixi/ # pixi environment cache
├─ pixi.toml # environment definition (bttoxin_digger + blast)
├─ pixi.lock # resolved environment
├─ run_digger_pixi.sh # wrapper to run BtToxin_Digger in this env
├─ README.md
└─ examples/
├─ inputs/ # copied test inputs (C15.fna, HAN055.fna)
├─ C15_pixi_v5/ # pixi run output (example)
├─ HAN055_pixi_v5_clean/ # pixi run output (example)
├─ C15_docker/ # docker output copy (baseline)
├─ HAN055_docker/ # docker output copy (baseline)
├─ diffs/ # docker vs pixi diffs
└─ COMPARE_REPORT.md
docker compose run --rm digger-repro pixi run blastdbcmd -db /app/.pixi/envs/default/bin/BTTCMP_db/bt_toxin/db/bt_toxin -info
```

View File

@@ -1,195 +1,107 @@
# BtToxin_Digger (pixi) 复现环境
# BtToxin_Digger (pixi) 复现与 Docker 镜像
本仓库是 BtToxin_Digger 1.0.10 的**可复现运行环境 + 示例输出**,支持 **BLAST v5 数据库兼容性**。这**不是**官方分支或新版本发布
本仓库提供了一个 **可复现的运行环境**,用于运行 BtToxin_Digger 1.0.10,并打包为基于 `ghcr.io/prefix-dev/pixi` 的 Docker 镜像
包含内容:
1. **BtToxin_Digger 1.0.10** (通过 Pixi 安装)
2. **BLAST+ 2.16.0** (兼容 v5 数据库)
3. **预置 BtToxin 数据库** (已集成到镜像中)
## 许可证 / 引用 / 免责声明
- **BtToxin_Digger** 由原作者开发,如在研究中使用请引用上游论文
- **本仓库**仅提供环境封装pixi)和示例运行用于复现,不修改 BtToxin_Digger 源代码
- **免责声明**:这是独立的社区维护配置,未经上游作者认可
- **BtToxin_Digger** 由原作者开发;如果在研究中使用请引用上游发表的论文
- **本仓库** 仅提供环境封装 (pixi/docker)不修改 BtToxin_Digger 源代码
本目录使用 pixi 复现 `quay.io/biocontainers/bttoxin_digger:1.0.10--hdfd78af_0` 的环境,使 `scripts/run_single_fna_pipeline.py` 的 digger 步骤可以在无 Docker 环境下运行。
## 1. 快速开始 (使用 Docker)
## 1) 环境定义(对比 Docker 镜像)
最简单的运行方式是使用包含的 `docker-compose.yml` 或全局项目配置。
- `pixi.toml` 保持 `bttoxin_digger=1.0.10` + `perl=5.26.2`(旧版栈),同时升级 `blast` 到支持 v5 的版本以兼容 BLASTDB v5
- 相对于 `quay.io/biocontainers/bttoxin_digger:1.0.10--hdfd78af_0` 的变更:
- BLAST+ 从 2.12.0 升级到 2.16.0(读取 v5 数据库所需)
- 显式固定 `perl-file-tee==0.07``perl-list-util==1.38`
- `channel-priority = "disabled"` 以允许混合 bioconda/conda-forge 和旧版 perl 兼容标签
创建环境:
### 构建镜像
```bash
cd bttoxin_digger_v5_repro
# 在本目录下
docker compose build
```
### 运行分析
将你的输入 `.fna` 文件放入 `examples/inputs` (或挂载你自己的目录),然后运行:
```bash
# 查看帮助
docker compose run --rm digger-repro pixi run BtToxin_Digger --help
# 对特定文件运行分析
# 注意:输入路径必须匹配内部挂载点 (/app/jobs)
docker compose run --rm digger-repro pixi run BtToxin_Digger \
--SeqPath /app/jobs \
--Scaf_suffix .fna \
--threads 4
```
### 目录挂载说明
- `/app/jobs`: 挂载你的输入序列文件目录。
- `/app/data`: 挂载你期望的输出目录 (如果在参数中使用绝对路径)。
## 2. Docker 镜像构建原理
镜像使用 `docker/Dockerfile` 构建。
### 基础镜像
使用 `ghcr.io/prefix-dev/pixi:latest` 以确保一致的 conda 兼容环境。
### 数据库集成
外部数据库 (`external_dbs/bt_toxin`) 在构建时 **被复制到镜像中**
目标位置:`/app/.pixi/envs/default/bin/BTTCMP_db/bt_toxin`
这替换了 bioconda 包自带的默认数据库,确保:
1. 使用最新的毒素定义。
2. BLAST v5 索引与安装的 BLAST+ 2.16.0 兼容。
### 环境定义 (`pixi.toml`)
- `bttoxin_digger = "==1.0.10"`
- `perl = "==5.26.2"` (旧版兼容需求)
- `blast = "==2.16.0"` (升级以支持 v5 数据库)
- `channel-priority = "disabled"`
## 3. 开发 / 手动使用
如果你想在不使用 Docker 的情况下使用本地 Pixi 运行:
```bash
# 安装环境
pixi install
```
## 2) 数据库配置BLAST v4 vs v5
`external_dbs/bt_toxin` 下的外部 BTTCMP 数据库使用 BLAST v5 索引(由新版 BLAST+ 构建)。如果使用 BLAST 2.7,需要重建 v4 数据库;使用 BLAST >= 2.10 可以直接使用 v5 数据库。
### 推荐:使用共享的 `external_dbs`(无需复制)
保持单一数据源,链接到 pixi 环境:
```bash
ENV_BIN=bttoxin_digger_v5_repro/.pixi/envs/default/bin
# 链接数据库 (如果不使用 Docker需要手动操作)
# Dockerfile 会通过复制文件自动完成此步骤。
ENV_BIN=.pixi/envs/default/bin
rm -rf "$ENV_BIN/BTTCMP_db/bt_toxin"
ln -sfn $(pwd)/external_dbs/bt_toxin "$ENV_BIN/BTTCMP_db/bt_toxin"
# 运行
pixi run BtToxin_Digger --help
```
这样可以避免在仓库内复制大型数据库。
## 4. 仓库结构
### 可选:在本仓库内冻结快照
```
.
├── docker/
│ └── Dockerfile # Docker 构建定义
├── docker-compose.yml # 本地测试编排
├── external_dbs/ # 数据库源 (构建时复制到镜像)
│ └── bt_toxin/ # 实际的数据库文件
├── pixi.toml # 环境依赖定义
├── pixi.lock # 确切的版本锁定
└── examples/ # 测试输入和输出
```
如果需要本仓库完全自包含,可以复制快照并指向它(注意:如果要推送到 Git考虑使用 Git LFS
## 5. 更新数据库
要更新容器使用的数据库:
1. 更新 `external_dbs/bt_toxin/` 中的文件。
2. 重新构建 Docker 镜像:
```bash
SNAPSHOT=bttoxin_digger_v5_repro/external_dbs_snapshot
mkdir -p "$SNAPSHOT"
cp -a external_dbs/bt_toxin "$SNAPSHOT/"
ln -sfn "$SNAPSHOT/bt_toxin" "$ENV_BIN/BTTCMP_db/bt_toxin"
docker compose build --no-cache
```
## 3) 运行 BtToxin_Digger组装基因组
`run_digger_pixi.sh` 在本目录内设置 `RATTLER_CACHE_DIR`,使 pixi 可以在工作区写入缓存(默认的 `~/.cache` 路径可能被沙箱阻止)。
单个 `.fna` 文件示例(使用干净的工作目录):
```bash
mkdir -p work/C15_pixi_run
cd work/C15_pixi_run
bash ../../run_digger_pixi.sh ../../examples/inputs .fna 4
```
如果要显式绑定 `external_dbs/bt_toxin`
```bash
bash ../../run_digger_pixi.sh ../../examples/inputs .fna 4 /path/to/external_dbs/bt_toxin
```
输出会生成在工作目录的 `Results/` 下。
### 参数说明
| 参数 | 说明 |
|------|------|
| `input_dir` | 输入目录(存放 `.fna` 文件) |
| `scaf_suffix` | 输入文件后缀(如 `.fna` |
| `threads` | 线程数(默认 4 |
| `bttoxin_db_dir` | 外部 bt_toxin 数据库路径(可选) |
### 与 scripts/run_single_fna_pipeline.py 的一致性
pixi 脚本调用的 BtToxin_Digger 参数与 `scripts/run_single_fna_pipeline.py` 里的 Docker 调用一致:
| 参数 | 说明 |
|------|------|
| `--SeqPath <dir>` | 输入目录 |
| `--SequenceType nucl` | 核酸输入 |
| `--Scaf_suffix .fna` | 文件后缀 |
| `--threads 4` | 线程数 |
**差异点:**
- Docker 版本会自动绑定 `external_dbs/bt_toxin`(若存在),并把输出整理到 `runs/<out_root>/digger`
- pixi 版本默认在当前工作目录生成 `Results/`
- `scripts/run_single_fna_pipeline.py` 还会继续运行 Shotter + reportpixi 脚本只执行 BtToxin_Digger 本体
## 4) 输出和对比(示例)
### 输入文件
- `examples/inputs/C15.fna`
- `examples/inputs/HAN055.fna`
### 示例运行结果
| 类型 | C15 | HAN055 |
|------|-----|--------|
| pixi 运行 | `examples/C15_pixi_v5` | `examples/HAN055_pixi_v5_clean` |
| Docker 运行 | `examples/C15_docker/digger` | `examples/HAN055_docker/digger` |
### 对比报告
- 汇总报告:`examples/COMPARE_REPORT.md`
- Diff 文件:
- `examples/diffs/C15_docker_vs_pixi_v5.diff`
- `examples/diffs/HAN055_docker_vs_pixi_v5_clean.diff`
## 5) 外部数据库更新v5
当从 BtToxin_Digger 仓库更新 `external_dbs/bt_toxin`BLAST 数据库是 v5 格式,需要 BLAST >= 2.10.0。这就是为什么本 pixi 环境将 BLAST 升级到 2.16.0。
更新 `external_dbs/bt_toxin` 后,确保 pixi 环境仍指向该目录(见第 2 节)。使用 BLAST 2.16.0 时无需重新索引,因为上游仓库已包含预构建的 v5 索引。如果降级到 BLAST 2.7,需要重建 v4 数据库(见第 2 节)。
### 更新步骤
```bash
mkdir -p external_dbs
rm -rf external_dbs/bt_toxin tmp_bttoxin_repo
git clone --filter=blob:none --no-checkout https://github.com/liaochenlanruo/BtToxin_Digger.git tmp_bttoxin_repo
cd tmp_bttoxin_repo
git sparse-checkout init --cone
git sparse-checkout set BTTCMP_db/bt_toxin
git checkout master
cd ..
cp -a tmp_bttoxin_repo/BTTCMP_db/bt_toxin external_dbs/bt_toxin
rm -rf tmp_bttoxin_repo
```
### 验证数据库绑定
```bash
# 检查数据库文件是否完整
ls -lh external_dbs/bt_toxin/db/
# 验证容器能正确访问绑定的数据库
docker run --rm \
-v "$(pwd)/external_dbs/bt_toxin:/usr/local/bin/BTTCMP_db/bt_toxin:ro" \
quay.io/biocontainers/bttoxin_digger:1.0.10--hdfd78af_0 \
bash -lc 'ls -lh /usr/local/bin/BTTCMP_db/bt_toxin/db | head'
```
输出应显示 `.pin/.psq/.phr` 等文件,且时间戳/大小与宿主机一致,说明绑定成功。
## 6) 目录结构
```
bttoxin_digger_v5_repro/
├─ .pixi/ # pixi 环境缓存
├─ pixi.toml # 环境定义bttoxin_digger + blast
├─ pixi.lock # 锁定的环境依赖
├─ run_digger_pixi.sh # 在此环境运行 BtToxin_Digger 的封装脚本
├─ README.md # 英文文档
├─ README_CN.md # 中文文档
└─ examples/
├─ inputs/ # 测试输入文件C15.fna, HAN055.fna
├─ C15_pixi_v5/ # pixi 运行输出(示例)
├─ HAN055_pixi_v5_clean/ # pixi 运行输出(示例)
├─ C15_docker/ # Docker 输出副本(基准)
├─ HAN055_docker/ # Docker 输出副本(基准)
├─ diffs/ # Docker vs pixi 差异文件
└─ COMPARE_REPORT.md # 对比报告
```
## 7) 常见问题
### Q: 为什么需要这个复现环境?
A: 主项目的 `scripts/run_single_fna_pipeline.py` 默认使用 Docker 运行 BtToxin_Digger。本环境提供了一个无 Docker 的替代方案,使用 pixi 管理依赖,适用于:
- 无法使用 Docker 的环境(如某些 HPC 集群)
- 需要调试或修改 BtToxin_Digger 运行参数的场景
- 验证 pixi 环境与 Docker 环境输出一致性
### Q: pixi 运行结果与 Docker 一致吗?
A: 是的,`examples/COMPARE_REPORT.md``examples/diffs/` 目录包含了详细的对比结果,证明两种方式的输出是一致的。
### Q: 如何切换回 Docker 运行?
A: 直接使用主项目的 `scripts/run_single_fna_pipeline.py`,它默认使用 Docker。

View File

@@ -0,0 +1,12 @@
services:
digger-repro:
build:
context: .
dockerfile: docker/Dockerfile
image: bttoxin-digger:v5-repro
volumes:
- ./examples/inputs:/app/jobs
- ./examples:/app/examples
# Mount the current directory to verify outputs easily if needed
# But the DB is now baked in, so no need to mount external_dbs
command: pixi run BtToxin_Digger --help

View File

@@ -0,0 +1,28 @@
# BtToxin Digger v5 容器镜像
# 基于 pixi 管理的 conda 环境
FROM ghcr.io/prefix-dev/pixi:latest
WORKDIR /app
# 复制 pixi 配置
COPY pixi.toml .
COPY pixi.lock .
# 安装依赖
RUN pixi install
# 复制外部数据库替换默认数据库
# 注意:必须在 pixi install 之后执行,且需要先清理原有目录以确保完全替换
# 这一步假设构建上下文包含 external_dbs 目录
RUN rm -rf /app/.pixi/envs/default/bin/BTTCMP_db/bt_toxin
COPY external_dbs/bt_toxin /app/.pixi/envs/default/bin/BTTCMP_db/bt_toxin
# 创建工作目录
RUN mkdir -p /app/jobs /app/data
# 暴露常用端口
EXPOSE 9000
# 默认命令
CMD ["pixi", "run", "BtToxin_Digger", "--help"]

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,22 @@
{
"version": "1.2",
"dbname": "bt_toxin",
"dbtype": "Protein",
"db-version": 5,
"description": "bt_toxin20251104.fas",
"number-of-letters": 996368,
"number-of-sequences": 1199,
"last-updated": "2025-11-04T15:35:00",
"number-of-volumes": 1,
"bytes-total": 1149077,
"bytes-to-cache": 1007264,
"files": [
"bt_toxin.pdb",
"bt_toxin.phr",
"bt_toxin.pin",
"bt_toxin.pot",
"bt_toxin.psq",
"bt_toxin.ptf",
"bt_toxin.pto"
]
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,36 @@
def get_unique_headers(file_path):
"""读取文件中以'>'开头的行,返回'>'后面内容的集合"""
headers = set()
with open(file_path, 'r') as f:
for line in f:
line = line.strip()
if line.startswith('>'):
# 提取'>'后面的内容(包括可能的空格和其他字符)
header = line[1:]
headers.add(header)
return headers
# 输入文件路径
file1 = 'bt_toxin20251104.fas'
file2 = 'all_app_cry_cyt_gpp_mcf_mpf_mpp_mtx_pra_prb_spp_tpp_txp_vip_vpa_vpb_xpp_fasta_sequences.txt'
output_file = 'unique_headers.txt'
# 获取两个文件中的header集合
headers1 = get_unique_headers(file1)
headers2 = get_unique_headers(file2)
# 计算各自独有的header
unique_to_file1 = headers1 - headers2
unique_to_file2 = headers2 - headers1
# 写入输出文件
with open(output_file, 'w') as out_f:
out_f.write(f"### Unique headers in {file1} ###\n")
for header in sorted(unique_to_file1):
out_f.write(f">{header}\n")
out_f.write(f"\n### Unique headers in {file2} ###\n")
for header in sorted(unique_to_file2):
out_f.write(f">{header}\n")
print(f"处理完成,结果已保存至 {output_file}")

View File

@@ -2,7 +2,7 @@ version: '3.8'
services:
postgres:
image: postgres:15-alpine
image: docker.m.daocloud.io/library/postgres:15-alpine
container_name: bttoxin_postgres
environment:
POSTGRES_USER: bttoxin
@@ -11,7 +11,7 @@ services:
volumes:
- postgres_data:/var/lib/postgresql/data
ports:
- "5432:5432"
- "5434:5432"
healthcheck:
test: ["CMD-SHELL", "pg_isready -U bttoxin"]
interval: 10s
@@ -20,13 +20,13 @@ services:
restart: unless-stopped
redis:
image: redis:7-alpine
image: docker.m.daocloud.io/library/redis:7-alpine
container_name: bttoxin_redis
command: redis-server --appendonly yes
volumes:
- redis_data:/data
ports:
- "6379:6379"
- "6380:6379"
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
@@ -43,13 +43,17 @@ services:
volumes:
- ../backend:/app
- ../data:/data
- ../Data:/Data
ports:
- "8000:8000"
- "8002:8000"
environment:
- DATABASE_URL=postgresql://bttoxin:bttoxin_password@postgres:5432/bttoxin_db
- REDIS_URL=redis://redis:6379/0
- CELERY_BROKER_URL=redis://redis:6379/0
- CELERY_RESULT_BACKEND=redis://redis:6379/0
- WORKSPACE_BASE_PATH=/data/jobs
- DEBUG=True
- PYTHONPATH=/app
depends_on:
postgres:
condition: service_healthy
@@ -70,8 +74,11 @@ services:
environment:
- DATABASE_URL=postgresql://bttoxin:bttoxin_password@postgres:5432/bttoxin_db
- REDIS_URL=redis://redis:6379/0
- CELERY_BROKER_URL=redis://redis:6379/0
- CELERY_RESULT_BACKEND=redis://redis:6379/0
- WORKSPACE_BASE_PATH=/data/jobs
- C_FORCE_ROOT=true
- PYTHONPATH=/app
depends_on:
- postgres
- redis
@@ -87,6 +94,9 @@ services:
- ../backend:/app
environment:
- REDIS_URL=redis://redis:6379/0
- CELERY_BROKER_URL=redis://redis:6379/0
- CELERY_RESULT_BACKEND=redis://redis:6379/0
- PYTHONPATH=/app
depends_on:
- redis
restart: unless-stopped
@@ -98,9 +108,12 @@ services:
container_name: bttoxin_flower
command: celery -A app.core.celery_app flower --port=5555
ports:
- "5555:5555"
- "5556:5555"
environment:
- REDIS_URL=redis://redis:6379/0
- CELERY_BROKER_URL=redis://redis:6379/0
- CELERY_RESULT_BACKEND=redis://redis:6379/0
- PYTHONPATH=/app
depends_on:
- redis
restart: unless-stopped
@@ -111,11 +124,24 @@ services:
dockerfile: docker/frontend/Dockerfile
container_name: bttoxin_frontend
ports:
- "3000:80"
- "3002:80"
depends_on:
- api
restart: unless-stopped
# Digger 任务 Worker - 使用 bttoxin_digger_v5_repro 的 pixi 环境
digger:
build:
context: ../bttoxin_digger_v5_repro
dockerfile: docker/Dockerfile
container_name: bttoxin_digger
volumes:
- ../data:/data
- ../Data:/Data
# 保持容器运行,以便通过 docker exec 调用,或者避免无限重启循环
command: tail -f /dev/null
restart: unless-stopped
volumes:
postgres_data:
redis_data: