Move digger reproduction env to tools/reproduction/
- Moved bttoxin_digger_v5_repro to tools/reproduction/bttoxin_digger - Updated docker-compose.yml to point to the new location - This declutters the root directory while preserving the reproduction environment
This commit is contained in:
206
tools/reproduction/bttoxin_digger/AGENTS.md
Normal file
206
tools/reproduction/bttoxin_digger/AGENTS.md
Normal file
@@ -0,0 +1,206 @@
|
||||
# BtToxin_Digger v5 Reproduction - AGENTS.md
|
||||
|
||||
本文档为 AI 助手和开发者提供 bttoxin_digger_v5_repro 项目的快速参考指南。
|
||||
|
||||
## 项目概述
|
||||
|
||||
BtToxin_Digger v5 Reproduction 是一个基于 Pixi 和 Docker 的 BtToxin_Digger 1.0.10 可复现运行环境,集成了最新的 BLAST v5 毒素数据库。
|
||||
|
||||
**核心功能:**
|
||||
- 使用 `ghcr.io/prefix-dev/pixi:latest` 作为基础镜像
|
||||
- 预装 BtToxin_Digger 1.0.10 和 BLAST+ 2.16.0
|
||||
- 内置最新版本的 BtToxin 数据库(20251104)
|
||||
- 支持 Docker 容器化部署
|
||||
|
||||
## 目录结构
|
||||
|
||||
```
|
||||
bttoxin_digger_v5_repro/
|
||||
├── docker/
|
||||
│ └── Dockerfile # Docker 镜像构建文件
|
||||
├── docker-compose.yml # 本地测试编排配置
|
||||
├── external_dbs/ # 外部数据库源(构建时复制到镜像)
|
||||
│ └── bt_toxin/
|
||||
│ ├── db/ # BLAST 索引文件 (.phr, .pin, .psq)
|
||||
│ │ ├── bt_toxin.pdb # 数据库元数据
|
||||
│ │ ├── bt_toxin.phr # 蛋白质序列头索引
|
||||
│ │ ├── bt_toxin.pin # 蛋白质序列索引
|
||||
│ │ ├── bt_toxin.psq # 蛋白质序列数据
|
||||
│ │ └── old/ # 历史索引备份
|
||||
│ └── seq/ # FASTA 序列源文件
|
||||
│ ├── bt_toxin20251104.fas # 最新毒素序列(2025年11月)
|
||||
│ ├── bt_toxin20221208.fas
|
||||
│ └── updateDB.py # 数据库辅助脚本
|
||||
├── pixi.toml # Pixi 环境依赖定义
|
||||
├── pixi.lock # 依赖版本锁定文件
|
||||
├── examples/ # 测试用例
|
||||
│ └── inputs/ # 输入文件
|
||||
│ └── C15.fna # 测试基因组文件
|
||||
└── README.md # 项目说明文档
|
||||
```
|
||||
|
||||
## 快速开始
|
||||
|
||||
### 构建 Docker 镜像
|
||||
|
||||
```bash
|
||||
cd bttoxin_digger_v5_repro
|
||||
docker compose build
|
||||
```
|
||||
|
||||
### 运行分析(本地测试)
|
||||
|
||||
```bash
|
||||
# 查看帮助
|
||||
docker compose run --rm digger-repro pixi run BtToxin_Digger --help
|
||||
|
||||
# 运行分析
|
||||
docker compose run --rm digger-repro pixi run BtToxin_Digger \
|
||||
--SeqPath /app/jobs \
|
||||
--Scaf_suffix .fna \
|
||||
--threads 4
|
||||
```
|
||||
|
||||
### 使用主服务中的 Digger 容器
|
||||
|
||||
主项目的 `docker/docker-compose.yml` 中包含一个持久运行的 `digger` 服务:
|
||||
|
||||
```bash
|
||||
# 检查服务状态
|
||||
docker ps | grep bttoxin_digger
|
||||
|
||||
# 执行分析任务
|
||||
docker exec bttoxin_digger pixi run BtToxin_Digger \
|
||||
--SeqPath /data \
|
||||
--Scaf_suffix .fna \
|
||||
--threads 4
|
||||
```
|
||||
|
||||
## 数据库管理
|
||||
|
||||
### 数据库结构说明
|
||||
|
||||
- **`seq/` 目录**:存放原始 FASTA 序列文件(如 `bt_toxin20251104.fas`)
|
||||
- **`db/` 目录**:存放 BLAST 生成的二进制索引文件
|
||||
- **对应关系**:`db/` 中的文件由 `makeblastdb` 命令从 `seq/` 中的 FASTA 文件生成
|
||||
|
||||
### 验证数据库版本
|
||||
|
||||
```bash
|
||||
docker exec bttoxin_digger pixi run blastdbcmd \
|
||||
-db /app/.pixi/envs/default/bin/BTTCMP_db/bt_toxin/db/bt_toxin \
|
||||
-info
|
||||
```
|
||||
|
||||
**预期输出:**
|
||||
```text
|
||||
Database: bt_toxin20251104.fas
|
||||
Date: Nov 4, 2025 3:35 PM
|
||||
BLASTDB Version: 5
|
||||
```
|
||||
|
||||
### 更新数据库(未来版本)
|
||||
|
||||
当上游仓库发布新版本(如 2026/2027 年)时:
|
||||
|
||||
```bash
|
||||
# 1. 下载新序列文件到 seq/ 目录
|
||||
cp /path/to/new/bt_toxin20xxxxxx.fas external_dbs/bt_toxin/seq/
|
||||
|
||||
# 2. 重新生成索引(使用 makeblastdb)
|
||||
makeblastdb \
|
||||
-in external_dbs/bt_toxin/seq/bt_toxin20xxxxxx.fas \
|
||||
-dbtype prot \
|
||||
-out external_dbs/bt_toxin/db/bt_toxin \
|
||||
-parse_seqids
|
||||
|
||||
# 3. 重新构建 Docker 镜像
|
||||
docker compose build --no-cache
|
||||
|
||||
# 4. 重启服务
|
||||
docker compose -f ../docker/docker-compose.yml up -d digger
|
||||
```
|
||||
|
||||
## 常见任务
|
||||
|
||||
### 查看 Digger 帮助
|
||||
|
||||
```bash
|
||||
docker exec bttoxin_digger pixi run BtToxin_Digger --help
|
||||
```
|
||||
|
||||
### 重新构建镜像(无缓存)
|
||||
|
||||
```bash
|
||||
cd bttoxin_digger_v5_repro
|
||||
docker compose build --no-cache
|
||||
```
|
||||
|
||||
### 查看容器日志
|
||||
|
||||
```bash
|
||||
docker logs bttoxin_digger
|
||||
```
|
||||
|
||||
### 进入容器交互模式
|
||||
|
||||
```bash
|
||||
docker exec -it bttoxin_digger pixi run bash
|
||||
```
|
||||
|
||||
## 故障排查
|
||||
|
||||
### 问题:数据库版本不正确
|
||||
|
||||
**症状**:分析结果显示使用的是旧版本数据库。
|
||||
|
||||
**排查步骤:**
|
||||
1. 检查 `external_dbs/bt_toxin/seq/` 目录下的文件是否为最新
|
||||
2. 确认 `external_dbs/bt_toxin/db/` 目录下的索引文件与 seq 文件匹配
|
||||
3. 验证镜像已重建:`docker images | grep digger`
|
||||
4. 重启服务:`docker compose -f ../docker/docker-compose.yml up -d digger`
|
||||
|
||||
### 问题:Perl 警告
|
||||
|
||||
**症状**:`Possible precedence issue with control flow operator`
|
||||
|
||||
**说明**:这是 Perl 5.26 与旧代码的兼容性问题,不影响程序运行,可忽略。
|
||||
|
||||
### 问题:容器启动失败
|
||||
|
||||
**排查步骤:**
|
||||
1. 检查 Docker 是否运行
|
||||
2. 查看详细日志:`docker compose logs`
|
||||
3. 确认端口未被占用
|
||||
|
||||
## 开发说明
|
||||
|
||||
### 修改 Dockerfile
|
||||
|
||||
如需修改环境依赖:
|
||||
1. 编辑 `pixi.toml`(添加/删除依赖)
|
||||
2. 运行 `pixi lock` 更新 `pixi.lock`
|
||||
3. 重新构建镜像
|
||||
|
||||
### 本地开发模式(不使用 Docker)
|
||||
|
||||
```bash
|
||||
# 安装 Pixi 环境
|
||||
cd bttoxin_digger_v5_repro
|
||||
pixi install
|
||||
|
||||
# 链接外部数据库
|
||||
ENV_BIN=.pixi/envs/default/bin
|
||||
rm -rf "$ENV_BIN/BTTCMP_db/bt_toxin"
|
||||
ln -sfn $(pwd)/external_dbs/bt_toxin "$ENV_BIN/BTTCMP_db/bt_toxin"
|
||||
|
||||
# 运行
|
||||
pixi run BtToxin_Digger --help
|
||||
```
|
||||
|
||||
## 相关文件
|
||||
|
||||
- `README.md`:项目完整说明(英文)
|
||||
- `README_CN.md`:项目完整说明(中文)
|
||||
- `docker/Dockerfile`:Docker 镜像构建配置
|
||||
- `docker-compose.yml`:本地测试编排配置
|
||||
141
tools/reproduction/bttoxin_digger/README.md
Normal file
141
tools/reproduction/bttoxin_digger/README.md
Normal file
@@ -0,0 +1,141 @@
|
||||
# BtToxin_Digger (pixi) reproduction & Docker Image
|
||||
|
||||
This repo is a **reproducible runtime environment** for BtToxin_Digger 1.0.10, packaged as a Docker image based on `ghcr.io/prefix-dev/pixi`.
|
||||
|
||||
It includes:
|
||||
1. **BtToxin_Digger 1.0.10** (installed via Pixi)
|
||||
2. **BLAST+ 2.16.0** (compatible with v5 databases)
|
||||
3. **Pre-bundled BtToxin Database** (baked into the image)
|
||||
|
||||
## License / Citation / Disclaimer
|
||||
|
||||
- **BtToxin_Digger** is developed by its original authors; cite the upstream publication if you use it in research.
|
||||
- **This repository** only provides an environment wrapper (pixi/docker); it does not modify BtToxin_Digger source code.
|
||||
|
||||
## 1. Quick Start with Docker
|
||||
|
||||
The easiest way to run this is using the included `docker-compose.yml` or the global project configuration.
|
||||
|
||||
### Build the Image
|
||||
|
||||
```bash
|
||||
# In this directory
|
||||
docker compose build
|
||||
```
|
||||
|
||||
### Run Analysis
|
||||
|
||||
Place your input `.fna` files in `examples/inputs` (or mount your own directory), then run:
|
||||
|
||||
```bash
|
||||
# Run help
|
||||
docker compose run --rm digger-repro pixi run BtToxin_Digger --help
|
||||
|
||||
# Run analysis on a specific file
|
||||
# Note: Input path must match the internal mount point (/app/jobs)
|
||||
docker compose run --rm digger-repro pixi run BtToxin_Digger \
|
||||
--SeqPath /app/jobs \
|
||||
--Scaf_suffix .fna \
|
||||
--threads 4
|
||||
```
|
||||
|
||||
### Directory Mounting
|
||||
|
||||
- `/app/jobs`: Mount your input sequence files here.
|
||||
- `/app/data`: Mount your desired output directory here (if using absolute paths in arguments).
|
||||
|
||||
## 2. Docker Image Construction
|
||||
|
||||
The image is built using `docker/Dockerfile`.
|
||||
|
||||
### Base Image
|
||||
Uses `ghcr.io/prefix-dev/pixi:latest` to ensure a consistent conda-compatible environment.
|
||||
|
||||
### Database Integration
|
||||
The external database (`external_dbs/bt_toxin`) is **copied into the image** during build time.
|
||||
Target location: `/app/.pixi/envs/default/bin/BTTCMP_db/bt_toxin`
|
||||
|
||||
This replaces the default database shipped with the bioconda package, ensuring:
|
||||
1. Latest toxin definitions are used.
|
||||
2. BLAST v5 indices are compatible with the installed BLAST+ 2.16.0.
|
||||
|
||||
### Environment Definition (`pixi.toml`)
|
||||
- `bttoxin_digger = "==1.0.10"`
|
||||
- `perl = "==5.26.2"` (Legacy requirement)
|
||||
- `blast = "==2.16.0"` (Upgraded for v5 DB support)
|
||||
- `channel-priority = "disabled"`
|
||||
|
||||
## 3. Development / Manual Usage
|
||||
|
||||
If you want to run without Docker using local Pixi:
|
||||
|
||||
```bash
|
||||
# Install environment
|
||||
pixi install
|
||||
|
||||
# Link the database (required manually if not using Docker)
|
||||
# The Dockerfile does this automatically by copying files.
|
||||
ENV_BIN=.pixi/envs/default/bin
|
||||
rm -rf "$ENV_BIN/BTTCMP_db/bt_toxin"
|
||||
ln -sfn $(pwd)/external_dbs/bt_toxin "$ENV_BIN/BTTCMP_db/bt_toxin"
|
||||
|
||||
# Run
|
||||
pixi run BtToxin_Digger --help
|
||||
```
|
||||
|
||||
## 4. Repository Layout
|
||||
|
||||
```
|
||||
.
|
||||
├── docker/
|
||||
│ └── Dockerfile # Docker build definition
|
||||
├── docker-compose.yml # Local test orchestration
|
||||
├── external_dbs/ # Database source (copied into image)
|
||||
│ └── bt_toxin/ # The actual database files
|
||||
├── pixi.toml # Environment dependencies
|
||||
├── pixi.lock # Exact version lock
|
||||
└── examples/ # Test inputs and outputs
|
||||
```
|
||||
|
||||
## 5. Updating the Database (Important for Future Updates)
|
||||
|
||||
The database consists of two parts in `external_dbs/bt_toxin`:
|
||||
1. **`seq/` Directory**: Contains the raw FASTA sequence files (e.g., `bt_toxin20251104.fas`).
|
||||
2. **`db/` Directory**: Contains the BLAST indices (`.phr`, `.pin`, `.psq`) generated from the sequences.
|
||||
|
||||
**Relationship**: The files in `db/` are **generated from** the FASTA files in `seq/` using `makeblastdb`. The filename of the source FASTA (e.g., `bt_toxin20251104.fas`) is embedded in the `db` files metadata.
|
||||
|
||||
### How to Update (e.g., for 2026/2027 data)
|
||||
|
||||
If a new database version is released (e.g., from https://github.com/liaochenlanruo/BtToxin_Digger), follow these steps:
|
||||
|
||||
1. **Download New Sequences**:
|
||||
Place the new FASTA file (e.g., `bt_toxin2026xxxx.fas`) into `external_dbs/bt_toxin/seq/`.
|
||||
|
||||
2. **Generate New Indices (Critical Step)**:
|
||||
You must regenerate the indices in `external_dbs/bt_toxin/db/`. You can use a temporary container or local BLAST+ to do this.
|
||||
|
||||
```bash
|
||||
# Example using the local pixi environment (if installed)
|
||||
# Or use a container with blast installed
|
||||
makeblastdb \
|
||||
-in external_dbs/bt_toxin/seq/bt_toxin2026xxxx.fas \
|
||||
-dbtype prot \
|
||||
-out external_dbs/bt_toxin/db/bt_toxin \
|
||||
-parse_seqids
|
||||
```
|
||||
|
||||
*Note: The `-out` parameter must end with `bt_toxin` to match what the tool expects.*
|
||||
|
||||
3. **Rebuild Docker Image**:
|
||||
The Dockerfile copies `external_dbs/bt_toxin` into the image. You must rebuild it to include the changes.
|
||||
|
||||
```bash
|
||||
docker compose build --no-cache
|
||||
```
|
||||
|
||||
4. **Verify**:
|
||||
Check the database version inside the new container:
|
||||
```bash
|
||||
docker compose run --rm digger-repro pixi run blastdbcmd -db /app/.pixi/envs/default/bin/BTTCMP_db/bt_toxin/db/bt_toxin -info
|
||||
```
|
||||
107
tools/reproduction/bttoxin_digger/README_CN.md
Normal file
107
tools/reproduction/bttoxin_digger/README_CN.md
Normal file
@@ -0,0 +1,107 @@
|
||||
# BtToxin_Digger (pixi) 复现与 Docker 镜像
|
||||
|
||||
本仓库提供了一个 **可复现的运行环境**,用于运行 BtToxin_Digger 1.0.10,并打包为基于 `ghcr.io/prefix-dev/pixi` 的 Docker 镜像。
|
||||
|
||||
包含内容:
|
||||
1. **BtToxin_Digger 1.0.10** (通过 Pixi 安装)
|
||||
2. **BLAST+ 2.16.0** (兼容 v5 数据库)
|
||||
3. **预置 BtToxin 数据库** (已集成到镜像中)
|
||||
|
||||
## 许可证 / 引用 / 免责声明
|
||||
|
||||
- **BtToxin_Digger** 由原作者开发;如果在研究中使用,请引用上游发表的论文。
|
||||
- **本仓库** 仅提供环境封装 (pixi/docker);不修改 BtToxin_Digger 源代码。
|
||||
|
||||
## 1. 快速开始 (使用 Docker)
|
||||
|
||||
最简单的运行方式是使用包含的 `docker-compose.yml` 或全局项目配置。
|
||||
|
||||
### 构建镜像
|
||||
|
||||
```bash
|
||||
# 在本目录下
|
||||
docker compose build
|
||||
```
|
||||
|
||||
### 运行分析
|
||||
|
||||
将你的输入 `.fna` 文件放入 `examples/inputs` (或挂载你自己的目录),然后运行:
|
||||
|
||||
```bash
|
||||
# 查看帮助
|
||||
docker compose run --rm digger-repro pixi run BtToxin_Digger --help
|
||||
|
||||
# 对特定文件运行分析
|
||||
# 注意:输入路径必须匹配内部挂载点 (/app/jobs)
|
||||
docker compose run --rm digger-repro pixi run BtToxin_Digger \
|
||||
--SeqPath /app/jobs \
|
||||
--Scaf_suffix .fna \
|
||||
--threads 4
|
||||
```
|
||||
|
||||
### 目录挂载说明
|
||||
|
||||
- `/app/jobs`: 挂载你的输入序列文件目录。
|
||||
- `/app/data`: 挂载你期望的输出目录 (如果在参数中使用绝对路径)。
|
||||
|
||||
## 2. Docker 镜像构建原理
|
||||
|
||||
镜像使用 `docker/Dockerfile` 构建。
|
||||
|
||||
### 基础镜像
|
||||
使用 `ghcr.io/prefix-dev/pixi:latest` 以确保一致的 conda 兼容环境。
|
||||
|
||||
### 数据库集成
|
||||
外部数据库 (`external_dbs/bt_toxin`) 在构建时 **被复制到镜像中**。
|
||||
目标位置:`/app/.pixi/envs/default/bin/BTTCMP_db/bt_toxin`
|
||||
|
||||
这替换了 bioconda 包自带的默认数据库,确保:
|
||||
1. 使用最新的毒素定义。
|
||||
2. BLAST v5 索引与安装的 BLAST+ 2.16.0 兼容。
|
||||
|
||||
### 环境定义 (`pixi.toml`)
|
||||
- `bttoxin_digger = "==1.0.10"`
|
||||
- `perl = "==5.26.2"` (旧版兼容需求)
|
||||
- `blast = "==2.16.0"` (升级以支持 v5 数据库)
|
||||
- `channel-priority = "disabled"`
|
||||
|
||||
## 3. 开发 / 手动使用
|
||||
|
||||
如果你想在不使用 Docker 的情况下使用本地 Pixi 运行:
|
||||
|
||||
```bash
|
||||
# 安装环境
|
||||
pixi install
|
||||
|
||||
# 链接数据库 (如果不使用 Docker,需要手动操作)
|
||||
# Dockerfile 会通过复制文件自动完成此步骤。
|
||||
ENV_BIN=.pixi/envs/default/bin
|
||||
rm -rf "$ENV_BIN/BTTCMP_db/bt_toxin"
|
||||
ln -sfn $(pwd)/external_dbs/bt_toxin "$ENV_BIN/BTTCMP_db/bt_toxin"
|
||||
|
||||
# 运行
|
||||
pixi run BtToxin_Digger --help
|
||||
```
|
||||
|
||||
## 4. 仓库结构
|
||||
|
||||
```
|
||||
.
|
||||
├── docker/
|
||||
│ └── Dockerfile # Docker 构建定义
|
||||
├── docker-compose.yml # 本地测试编排
|
||||
├── external_dbs/ # 数据库源 (构建时复制到镜像)
|
||||
│ └── bt_toxin/ # 实际的数据库文件
|
||||
├── pixi.toml # 环境依赖定义
|
||||
├── pixi.lock # 确切的版本锁定
|
||||
└── examples/ # 测试输入和输出
|
||||
```
|
||||
|
||||
## 5. 更新数据库
|
||||
|
||||
要更新容器使用的数据库:
|
||||
1. 更新 `external_dbs/bt_toxin/` 中的文件。
|
||||
2. 重新构建 Docker 镜像:
|
||||
```bash
|
||||
docker compose build --no-cache
|
||||
```
|
||||
12
tools/reproduction/bttoxin_digger/docker-compose.yml
Normal file
12
tools/reproduction/bttoxin_digger/docker-compose.yml
Normal file
@@ -0,0 +1,12 @@
|
||||
services:
|
||||
digger-repro:
|
||||
build:
|
||||
context: .
|
||||
dockerfile: docker/Dockerfile
|
||||
image: bttoxin-digger:v5-repro
|
||||
volumes:
|
||||
- ./examples/inputs:/app/jobs
|
||||
- ./examples:/app/examples
|
||||
# Mount the current directory to verify outputs easily if needed
|
||||
# But the DB is now baked in, so no need to mount external_dbs
|
||||
command: pixi run BtToxin_Digger --help
|
||||
28
tools/reproduction/bttoxin_digger/docker/Dockerfile
Normal file
28
tools/reproduction/bttoxin_digger/docker/Dockerfile
Normal file
@@ -0,0 +1,28 @@
|
||||
# BtToxin Digger v5 容器镜像
|
||||
# 基于 pixi 管理的 conda 环境
|
||||
|
||||
FROM ghcr.io/prefix-dev/pixi:latest
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# 复制 pixi 配置
|
||||
COPY pixi.toml .
|
||||
COPY pixi.lock .
|
||||
|
||||
# 安装依赖
|
||||
RUN pixi install
|
||||
|
||||
# 复制外部数据库替换默认数据库
|
||||
# 注意:必须在 pixi install 之后执行,且需要先清理原有目录以确保完全替换
|
||||
# 这一步假设构建上下文包含 external_dbs 目录
|
||||
RUN rm -rf /app/.pixi/envs/default/bin/BTTCMP_db/bt_toxin
|
||||
COPY external_dbs/bt_toxin /app/.pixi/envs/default/bin/BTTCMP_db/bt_toxin
|
||||
|
||||
# 创建工作目录
|
||||
RUN mkdir -p /app/jobs /app/data
|
||||
|
||||
# 暴露常用端口
|
||||
EXPOSE 9000
|
||||
|
||||
# 默认命令
|
||||
CMD ["pixi", "run", "BtToxin_Digger", "--help"]
|
||||
41
tools/reproduction/bttoxin_digger/examples/COMPARE_REPORT.md
Normal file
41
tools/reproduction/bttoxin_digger/examples/COMPARE_REPORT.md
Normal file
@@ -0,0 +1,41 @@
|
||||
# BtToxin_Digger pixi vs docker comparison (examples)
|
||||
|
||||
## Inputs
|
||||
|
||||
- Pixi C15: `runs/bttoxin_digger_v5_repro/examples/C15_pixi_v5/Results/Toxins/All_Toxins.txt`
|
||||
- Docker C15: `runs/bttoxin_digger_v5_repro/examples/C15_docker/digger/Results/Toxins/All_Toxins.txt`
|
||||
- Pixi HAN055: `runs/bttoxin_digger_v5_repro/examples/HAN055_pixi_v5_clean/Results/Toxins/All_Toxins.txt`
|
||||
- Docker HAN055: `runs/bttoxin_digger_v5_repro/examples/HAN055_docker/digger/Results/Toxins/All_Toxins.txt`
|
||||
|
||||
Diff files:
|
||||
|
||||
- `runs/bttoxin_digger_v5_repro/examples/diffs/C15_docker_vs_pixi_v5.diff`
|
||||
- `runs/bttoxin_digger_v5_repro/examples/diffs/HAN055_docker_vs_pixi_v5_clean.diff`
|
||||
|
||||
## C15 (docker vs pixi v5)
|
||||
|
||||
- Row counts: 24 vs 24
|
||||
- Hit_id set: identical
|
||||
- Differences: 6 rows differ (4 BLAST E-values, 2 SVM flags)
|
||||
|
||||
```
|
||||
Protein_id Field docker_value pixi_value
|
||||
NZ_CP021436.1_06204 Evalue of blast 1e-35 2e-35
|
||||
NZ_CP021436.1_06727 Evalue of blast 9e-33 1e-32
|
||||
NZ_CP021436.1_12182 Evalue of blast 7e-60 8e-60
|
||||
NZ_CP021438.1_00072 SVM NO YES
|
||||
NZ_CP021438.1_00163 Evalue of blast 3e-52 4e-52
|
||||
NZ_CP021439.1_00043 SVM YES NO
|
||||
```
|
||||
|
||||
## HAN055 (docker vs pixi v5 clean)
|
||||
|
||||
- Row counts: 24 vs 24
|
||||
- Hit_id set: identical
|
||||
- Differences: two rows differ only in the SVM flag
|
||||
|
||||
```
|
||||
Protein_id Hit_id docker_SVM pixi_SVM
|
||||
CP001910.1_00155 Cry2Aa22 YES NO
|
||||
CP001910.1_00475 Cry1Aa3 NO YES
|
||||
```
|
||||
66411
tools/reproduction/bttoxin_digger/examples/inputs/97-27.fna
Normal file
66411
tools/reproduction/bttoxin_digger/examples/inputs/97-27.fna
Normal file
File diff suppressed because it is too large
Load Diff
76433
tools/reproduction/bttoxin_digger/examples/inputs/C15.fna
Normal file
76433
tools/reproduction/bttoxin_digger/examples/inputs/C15.fna
Normal file
File diff suppressed because it is too large
Load Diff
Binary file not shown.
Binary file not shown.
Binary file not shown.
@@ -0,0 +1,22 @@
|
||||
{
|
||||
"version": "1.2",
|
||||
"dbname": "bt_toxin",
|
||||
"dbtype": "Protein",
|
||||
"db-version": 5,
|
||||
"description": "bt_toxin20251104.fas",
|
||||
"number-of-letters": 996368,
|
||||
"number-of-sequences": 1199,
|
||||
"last-updated": "2025-11-04T15:35:00",
|
||||
"number-of-volumes": 1,
|
||||
"bytes-total": 1149077,
|
||||
"bytes-to-cache": 1007264,
|
||||
"files": [
|
||||
"bt_toxin.pdb",
|
||||
"bt_toxin.phr",
|
||||
"bt_toxin.pin",
|
||||
"bt_toxin.pot",
|
||||
"bt_toxin.psq",
|
||||
"bt_toxin.ptf",
|
||||
"bt_toxin.pto"
|
||||
]
|
||||
}
|
||||
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,36 @@
|
||||
def get_unique_headers(file_path):
|
||||
"""读取文件中以'>'开头的行,返回'>'后面内容的集合"""
|
||||
headers = set()
|
||||
with open(file_path, 'r') as f:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if line.startswith('>'):
|
||||
# 提取'>'后面的内容(包括可能的空格和其他字符)
|
||||
header = line[1:]
|
||||
headers.add(header)
|
||||
return headers
|
||||
|
||||
# 输入文件路径
|
||||
file1 = 'bt_toxin20251104.fas'
|
||||
file2 = 'all_app_cry_cyt_gpp_mcf_mpf_mpp_mtx_pra_prb_spp_tpp_txp_vip_vpa_vpb_xpp_fasta_sequences.txt'
|
||||
output_file = 'unique_headers.txt'
|
||||
|
||||
# 获取两个文件中的header集合
|
||||
headers1 = get_unique_headers(file1)
|
||||
headers2 = get_unique_headers(file2)
|
||||
|
||||
# 计算各自独有的header
|
||||
unique_to_file1 = headers1 - headers2
|
||||
unique_to_file2 = headers2 - headers1
|
||||
|
||||
# 写入输出文件
|
||||
with open(output_file, 'w') as out_f:
|
||||
out_f.write(f"### Unique headers in {file1} ###\n")
|
||||
for header in sorted(unique_to_file1):
|
||||
out_f.write(f">{header}\n")
|
||||
|
||||
out_f.write(f"\n### Unique headers in {file2} ###\n")
|
||||
for header in sorted(unique_to_file2):
|
||||
out_f.write(f">{header}\n")
|
||||
|
||||
print(f"处理完成,结果已保存至 {output_file}")
|
||||
4952
tools/reproduction/bttoxin_digger/pixi.lock
Normal file
4952
tools/reproduction/bttoxin_digger/pixi.lock
Normal file
File diff suppressed because it is too large
Load Diff
13
tools/reproduction/bttoxin_digger/pixi.toml
Normal file
13
tools/reproduction/bttoxin_digger/pixi.toml
Normal file
@@ -0,0 +1,13 @@
|
||||
[workspace]
|
||||
name = "bttoxin_digger_v5_repro"
|
||||
channels = ["bioconda", "conda-forge", "bioconda/label/cf201901"]
|
||||
platforms = ["linux-64"]
|
||||
version = "0.1.0"
|
||||
channel-priority = "disabled"
|
||||
|
||||
[dependencies]
|
||||
bttoxin_digger = "==1.0.10"
|
||||
perl = "==5.26.2"
|
||||
perl-file-tee = "==0.07"
|
||||
perl-list-util = "==1.38"
|
||||
blast = "==2.16.0"
|
||||
30
tools/reproduction/bttoxin_digger/run_digger_pixi.sh
Executable file
30
tools/reproduction/bttoxin_digger/run_digger_pixi.sh
Executable file
@@ -0,0 +1,30 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
if [[ $# -lt 2 ]]; then
|
||||
echo "Usage: $0 <input_dir> <scaf_suffix> [threads] [bttoxin_db_dir]"
|
||||
echo "Example: $0 /home/zly/project/bttoxin-pipeline/tests/test_data .fna 4"
|
||||
echo "Example with external DB: $0 /home/zly/project/bttoxin-pipeline/tests/test_data .fna 4 /home/zly/project/bttoxin-pipeline/external_dbs/bt_toxin"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
input_dir="$1"
|
||||
scaf_suffix="$2"
|
||||
threads="${3:-4}"
|
||||
bttoxin_db_dir="${4:-}"
|
||||
|
||||
cache_dir="$(pwd)/.rattler-cache"
|
||||
mkdir -p "$cache_dir"
|
||||
if [[ -n "$bttoxin_db_dir" ]]; then
|
||||
if [[ -d "$bttoxin_db_dir" ]]; then
|
||||
env_bin="$(RATTLER_CACHE_DIR="$cache_dir" pixi run which BtToxin_Digger | xargs dirname)"
|
||||
ln -sfn "$bttoxin_db_dir" "$env_bin/BTTCMP_db/bt_toxin"
|
||||
else
|
||||
echo "[warn] bttoxin_db_dir not found: $bttoxin_db_dir" >&2
|
||||
fi
|
||||
fi
|
||||
RATTLER_CACHE_DIR="$cache_dir" pixi run BtToxin_Digger \
|
||||
--SeqPath "$input_dir" \
|
||||
--SequenceType nucl \
|
||||
--Scaf_suffix "$scaf_suffix" \
|
||||
--threads "$threads"
|
||||
Reference in New Issue
Block a user