docs: update README and add pixi-based tests
- Add property-based tests for PixiRunner - Add HAN055.fna test data file - Update README with pixi installation and usage guide - Update .gitignore for pixi and test artifacts - Update CLI to remove Docker-related arguments
This commit is contained in:
19
.gitignore
vendored
19
.gitignore
vendored
@@ -5,6 +5,7 @@ __pycache__/
|
||||
.Python
|
||||
venv/
|
||||
*.egg-info/
|
||||
.venv/
|
||||
|
||||
# Node
|
||||
node_modules/
|
||||
@@ -28,7 +29,21 @@ tests/test_data/genomes/*.fna
|
||||
.idea/
|
||||
*.swp
|
||||
|
||||
.venv/
|
||||
# Test/Build artifacts
|
||||
.pytest_cache/
|
||||
.hypothesis/
|
||||
htmlcov/
|
||||
.coverage
|
||||
|
||||
# Pipeline outputs
|
||||
runs/
|
||||
tests/output
|
||||
tests/output/
|
||||
tests/logs/
|
||||
|
||||
# Pixi (keep .pixi but ignore lock in subprojects)
|
||||
.pixi/envs/
|
||||
|
||||
# uv lock (optional, can be committed for reproducibility)
|
||||
# uv.lock
|
||||
|
||||
.kiro
|
||||
448
README.md
448
README.md
@@ -1,169 +1,199 @@
|
||||
# BtToxin Pipeline
|
||||
|
||||
Automated Bacillus thuringiensis toxin mining system with CI/CD integration.
|
||||
Automated Bacillus thuringiensis toxin mining system using pixi-managed environments.
|
||||
|
||||
## Quick Start (单机部署)
|
||||
|
||||
### uv .venv
|
||||
|
||||
```bash
|
||||
uv venv --managed-python -p 3.12 --seed .venv
|
||||
uv pip install backend/requirements.txt
|
||||
```
|
||||
## Quick Start
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Docker / Podman
|
||||
- Python 3.10+
|
||||
- Node.js 18+
|
||||
- [pixi](https://pixi.sh) - Modern package manager for conda environments
|
||||
- Linux x86_64 (linux-64 platform)
|
||||
|
||||
### Installation
|
||||
|
||||
1. Install pixi (if not already installed):
|
||||
|
||||
```bash
|
||||
# Linux/macOS
|
||||
curl -fsSL https://pixi.sh/install.sh | bash
|
||||
|
||||
# Or via Homebrew
|
||||
brew install pixi
|
||||
```
|
||||
|
||||
2. Clone and setup the project:
|
||||
|
||||
### Development Setup
|
||||
```bash
|
||||
# 1. Clone and setup
|
||||
git clone <your-repo>
|
||||
cd bttoxin-pipeline
|
||||
|
||||
# 2. 使用 Makefile 初始化与启动(单机)
|
||||
make setup
|
||||
make start
|
||||
|
||||
# 3. 初始化数据库(创建表)
|
||||
make db-init
|
||||
|
||||
# 4. 访问服务
|
||||
# API: http://localhost:8000/docs
|
||||
# Flower: http://localhost:5555
|
||||
# Frontend:http://localhost:3000
|
||||
|
||||
# (可选) 本地开发
|
||||
# Backend: uvicorn app.main:app --reload
|
||||
# Frontend: npm run dev
|
||||
# Install all environments (digger + pipeline)
|
||||
pixi install
|
||||
```
|
||||
|
||||
## Architecture
|
||||
This creates two isolated environments:
|
||||
- `digger`: BtToxin_Digger with bioconda dependencies (perl, blast, etc.)
|
||||
- `pipeline`: Python analysis tools (pandas, matplotlib, seaborn)
|
||||
|
||||
Nginx (Reverse Proxy)
|
||||
├── Frontend (Vue 3 Static)
|
||||
└── Backend (FastAPI + Swagger)
|
||||
├── PostgreSQL (SQLModel via SQLAlchemy)
|
||||
├── Redis (Broker/Result)
|
||||
├── Celery (Worker/Beat + Flower)
|
||||
└── Docker Engine (BtToxin_Digger)
|
||||
### Running the Pipeline
|
||||
|
||||
## Documentation
|
||||
#### Full Pipeline (Recommended)
|
||||
|
||||
- API 文档: 浏览器打开 `http://localhost:8000/docs`
|
||||
- 单机编排: `docker/docker-compose.yml`(唯一来源)
|
||||
- 环境变量示例: `backend/.env.example`
|
||||
- 常用命令: `make help`
|
||||
|
||||
### macOS + Podman 使用注意事项
|
||||
|
||||
- Podman 在 macOS 上通过虚拟机运行,宿主目录绑定到容器时,写权限可能受限。
|
||||
- 我们已在运行逻辑中对 macOS 进行特殊处理:将输入复制到容器内 `/tmp/input`,在 `/tmp` 执行 BtToxin_Digger,结束后把 `Results/` 与关键输出复制回挂载的 `/workspace`(宿主输出目录)。
|
||||
- 如仍遇写入问题:
|
||||
- 在 Podman Desktop 的虚拟机共享目录中,添加项目路径并开启写权限。
|
||||
- 如需,启用 rootful 模式并重启:`podman machine stop && podman machine set --rootful && podman machine start`
|
||||
- 手动验证挂载:`podman run --rm -v $(pwd)/tests/output:/workspace:rw alpine sh -lc 'echo ok > /workspace/test.txt && ls -l /workspace'`
|
||||
|
||||
### 本地离线容器测试(可选)
|
||||
|
||||
使用 `scripts/test_bttoxin_digger.py` 最小测试:
|
||||
Run the complete analysis pipeline with a single command:
|
||||
|
||||
```bash
|
||||
uv run python scripts/test_bttoxin_digger.py
|
||||
pixi run pipeline --fna tests/test_data/HAN055.fna
|
||||
```
|
||||
|
||||
要求:`tests/test_data` 下存在 `97-27.fna` 与 `C15.fna`,测试成功后在 `tests/output/Results/Toxins` 看到 6 个关键文件。
|
||||
This executes three stages:
|
||||
1. **Digger**: BtToxin_Digger toxin mining
|
||||
2. **Shotter**: Toxin scoring and target prediction
|
||||
3. **Plot**: Heatmap generation and report creation
|
||||
|
||||
#### 输入文件格式说明
|
||||
#### CLI Options
|
||||
|
||||
.fna 文件是 FASTA 格式的核酸序列文件,包含细菌的完整基因组序列:
|
||||
```bash
|
||||
pixi run pipeline --fna <file> [options]
|
||||
|
||||
- **97-27.fna**: Bacillus thuringiensis strain 97-27 的完整基因组序列
|
||||
- **C15.fna**: Bacillus thuringiensis strain C15 的完整基因组序列
|
||||
|
||||
文件格式示例:
|
||||
```>NZ_CP010088.1 Bacillus thuringiensis strain 97-27 chromosome, complete genome
|
||||
TAATGTAACACCAGTAAATATTTCATTCATATATTCTTTTAACTGTATTTTATATTCTTTCTACTCTACAATTTCTTTTA
|
||||
ACTGCCAATATGCATCTTCTAGCCAAGGGTGTAAAACTTTCAACGTGTCTTTTCTATCCCACAAATATGAAATATATGCA
|
||||
...
|
||||
Options:
|
||||
--fna PATH Input .fna file (required)
|
||||
--out_root PATH Output directory (default: runs/<stem>_run)
|
||||
--toxicity_csv PATH Toxicity data CSV (default: Data/toxicity-data.csv)
|
||||
--min_identity FLOAT Minimum identity threshold 0-1 (default: 0.0)
|
||||
--min_coverage FLOAT Minimum coverage threshold 0-1 (default: 0.0)
|
||||
--disallow_unknown_families Exclude unknown toxin families
|
||||
--require_index_hit Keep only hits with known specificity
|
||||
--lang {zh,en} Report language (default: zh)
|
||||
--bttoxin_db_dir PATH Custom bt_toxin database directory
|
||||
--threads INT Number of threads (default: 4)
|
||||
```
|
||||
|
||||
#### 挖掘结果解读
|
||||
#### Examples
|
||||
|
||||
BtToxin_Digger 分析完成后会生成以下关键结果文件:
|
||||
```bash
|
||||
# Basic run with default settings
|
||||
pixi run pipeline --fna tests/test_data/C15.fna
|
||||
|
||||
**1. 菌株毒素列表文件 (`.list`)**
|
||||
- 包含每个菌株中预测到的各类毒素蛋白的详细分类信息
|
||||
- 毒素类型包括:Cry、Cyt、Vip、Others、App、Gpp、Mcf、Mpf、Mpp、Mtx、Pra、Prb、Spp、Tpp、Vpa、Vpb、Xpp
|
||||
- 每个毒素显示:蛋白ID、长度、等级(Rank1-4)、BLAST结果、最佳匹配、覆盖度、相似度、SVM和HMM预测结果
|
||||
# Strict filtering for high-confidence results
|
||||
pixi run pipeline --fna tests/test_data/HAN055.fna \
|
||||
--min_identity 0.50 --min_coverage 0.60 \
|
||||
--disallow_unknown_families --require_index_hit
|
||||
|
||||
**2. 基因银行格式文件 (`.gbk`)**
|
||||
- 包含预测毒素基因的详细注释信息
|
||||
- 记录基因位置、蛋白描述、BLAST比对详情、预测结果等
|
||||
- 可用于后续的功能分析和可视化
|
||||
# English report with custom output directory
|
||||
pixi run pipeline --fna tests/test_data/HAN055.fna \
|
||||
--out_root runs/HAN055_strict --lang en
|
||||
|
||||
**3. 汇总表格 (`Bt_all_genes.table`)**
|
||||
- 所有菌株的毒素基因汇总表格
|
||||
- 显示每个菌株中不同类型毒素基因的数量和相似度信息
|
||||
|
||||
**4. 全部毒素列表 (`All_Toxins.txt`)**
|
||||
- 包含所有预测到的毒素基因的完整信息
|
||||
- 字段包括:菌株、蛋白ID、蛋白长度、链向、基因位置、SVM预测、BLAST结果、HMM结果、命中ID、比对长度、一致性、E值等
|
||||
|
||||
**测试结果示例**:
|
||||
- 97-27菌株预测到12个毒素基因,包括InhA1/2、Bmp1、Spp1Aa1、Zwa5A/6等
|
||||
- C15菌株预测到多个Cry毒素基因(Cry21Aa2、Cry21Aa3、Cry21Ca2、Cry5Ba1)和其他辅助毒素
|
||||
- 毒素等级分为Rank1-4,Rank1为最高置信度,Rank4为最低置信度
|
||||
- 相似度范围从27.62%到100%,表明与已知毒素的相似程度
|
||||
|
||||
### 单目录方案(跨平台稳定写入)
|
||||
|
||||
- 运行前,程序会将输入文件复制到宿主输出目录下的 `input_files/` 子目录;容器仅挂载该输出目录(读写)为 `/workspace`。
|
||||
- 工具运行时的 `--SeqPath` 指向 `/workspace/input_files`,工作目录也固定在 `/workspace`;所有结果与中间文件都会落在宿主的 `tests/output/` 下。
|
||||
|
||||
目录示例:
|
||||
|
||||
```
|
||||
tests/output/
|
||||
├── input_files/ # 输入文件副本
|
||||
│ ├── 97-27.fna
|
||||
│ └── C15.fna
|
||||
├── Results/ # BtToxin_Digger 输出
|
||||
│ └── Toxins/
|
||||
│ ├── 97-27.list
|
||||
│ ├── 97-27.gbk
|
||||
│ └── ...
|
||||
├── StatsFiles/ # 统计文件(如有)
|
||||
├── All_Toxins.txt
|
||||
└── BtToxin_Digger.log
|
||||
# Use custom database
|
||||
pixi run pipeline --fna tests/test_data/HAN055.fna \
|
||||
--bttoxin_db_dir /path/to/custom/bt_toxin
|
||||
```
|
||||
|
||||
## bttoxin_db更新
|
||||
### Individual Stage Commands
|
||||
|
||||
BtToxin_Digger 容器内置的数据库版本较旧(2021年8月),建议使用官方 GitHub 仓库的最新数据库。
|
||||
Run stages separately when needed:
|
||||
|
||||
### 数据库目录结构
|
||||
#### Digger Only
|
||||
|
||||
```
|
||||
external_dbs/bt_toxin/
|
||||
├── db/ # BLAST 索引文件(运行时必需)
|
||||
│ ├── bt_toxin.phr
|
||||
│ ├── bt_toxin.pin
|
||||
│ ├── bt_toxin.psq
|
||||
│ ├── bt_toxin.pdb
|
||||
│ ├── bt_toxin.pjs
|
||||
│ ├── bt_toxin.pot
|
||||
│ ├── bt_toxin.ptf
|
||||
│ ├── bt_toxin.pto
|
||||
│ └── old/
|
||||
└── seq/ # 序列源文件(留档/更新用)
|
||||
├── bt_toxin20251104.fas
|
||||
└── ...
|
||||
```bash
|
||||
pixi run digger-only --fna <file> [options]
|
||||
|
||||
Options:
|
||||
--fna PATH Input .fna file (required)
|
||||
--out_dir PATH Output directory (default: runs/<stem>_digger_only)
|
||||
--bttoxin_db_dir PATH Custom database directory
|
||||
--threads INT Number of threads (default: 4)
|
||||
--sequence_type Sequence type: nucl/orfs/prot/reads (default: nucl)
|
||||
```
|
||||
|
||||
### 更新步骤
|
||||
Example:
|
||||
```bash
|
||||
pixi run digger-only --fna tests/test_data/C15.fna --threads 8
|
||||
```
|
||||
|
||||
#### Shotter (Scoring)
|
||||
|
||||
```bash
|
||||
pixi run shotter [options]
|
||||
|
||||
Options:
|
||||
--toxicity_csv PATH Toxicity data CSV
|
||||
--all_toxins PATH All_Toxins.txt from Digger
|
||||
--output_dir PATH Output directory
|
||||
--min_identity FLOAT Minimum identity threshold
|
||||
--min_coverage FLOAT Minimum coverage threshold
|
||||
--allow_unknown_families / --disallow_unknown_families
|
||||
--require_index_hit Keep only indexed hits
|
||||
```
|
||||
|
||||
Example:
|
||||
```bash
|
||||
pixi run shotter \
|
||||
--all_toxins runs/C15_run/digger/Results/Toxins/All_Toxins.txt \
|
||||
--output_dir runs/C15_run/shotter
|
||||
```
|
||||
|
||||
#### Plot (Visualization)
|
||||
|
||||
```bash
|
||||
pixi run plot [options]
|
||||
|
||||
Options:
|
||||
--strain_scores PATH strain_target_scores.tsv from Shotter
|
||||
--toxin_support PATH toxin_support.tsv (optional)
|
||||
--species_scores PATH strain_target_species_scores.tsv (optional)
|
||||
--out_dir PATH Output directory
|
||||
--cmap STRING Colormap (default: viridis)
|
||||
--per_hit_strain NAME Generate per-hit heatmap for specific strain
|
||||
--merge_unresolved Merge other/unknown into unresolved
|
||||
--report_mode {summary,paper} Report style (default: paper)
|
||||
--lang {zh,en} Report language (default: zh)
|
||||
```
|
||||
|
||||
Example:
|
||||
```bash
|
||||
pixi run plot \
|
||||
--strain_scores runs/C15_run/shotter/strain_target_scores.tsv \
|
||||
--toxin_support runs/C15_run/shotter/toxin_support.tsv \
|
||||
--out_dir runs/C15_run/shotter \
|
||||
--per_hit_strain C15 --lang en
|
||||
```
|
||||
|
||||
## Output Structure
|
||||
|
||||
After running the pipeline:
|
||||
|
||||
```
|
||||
runs/<strain>_run/
|
||||
├── stage/ # Staged input file
|
||||
│ └── <strain>.fna
|
||||
├── digger/ # BtToxin_Digger outputs
|
||||
│ ├── Results/
|
||||
│ │ └── Toxins/
|
||||
│ │ ├── All_Toxins.txt
|
||||
│ │ ├── <strain>.list
|
||||
│ │ ├── <strain>.gbk
|
||||
│ │ └── Bt_all_genes.table
|
||||
│ └── BtToxin_Digger.log
|
||||
├── shotter/ # Shotter outputs
|
||||
│ ├── strain_target_scores.tsv
|
||||
│ ├── strain_scores.json
|
||||
│ ├── toxin_support.tsv
|
||||
│ ├── strain_target_species_scores.tsv
|
||||
│ ├── strain_species_scores.json
|
||||
│ ├── strain_target_scores.png
|
||||
│ ├── strain_target_species_scores.png
|
||||
│ ├── per_hit_<strain>.png
|
||||
│ └── shotter_report_paper.md
|
||||
├── logs/
|
||||
│ └── digger_execution.log
|
||||
└── pipeline_results.tar.gz # Bundled results
|
||||
```
|
||||
|
||||
## Database Update
|
||||
|
||||
BtToxin_Digger's built-in database may be outdated. Use the latest from GitHub:
|
||||
|
||||
### Update Steps
|
||||
|
||||
```bash
|
||||
mkdir -p external_dbs
|
||||
@@ -176,49 +206,149 @@ git sparse-checkout init --cone
|
||||
git sparse-checkout set BTTCMP_db/bt_toxin
|
||||
git checkout master
|
||||
|
||||
# 把目录拷贝到你的项目 external_dbs 下
|
||||
cd ..
|
||||
cp -a tmp_bttoxin_repo/BTTCMP_db/bt_toxin external_dbs/bt_toxin
|
||||
|
||||
# 清理临时 repo
|
||||
rm -rf tmp_bttoxin_repo
|
||||
```
|
||||
|
||||
### 验证数据库绑定
|
||||
The pipeline automatically detects `external_dbs/bt_toxin` if present.
|
||||
|
||||
```bash
|
||||
# 检查数据库文件是否完整
|
||||
ls -lh external_dbs/bt_toxin/db/
|
||||
### Database Structure
|
||||
|
||||
# 验证容器能正确访问绑定的数据库
|
||||
docker run --rm \
|
||||
-v "$(pwd)/external_dbs/bt_toxin:/usr/local/bin/BTTCMP_db/bt_toxin:ro" \
|
||||
quay.io/biocontainers/bttoxin_digger:1.0.10--hdfd78af_0 \
|
||||
bash -lc 'ls -lh /usr/local/bin/BTTCMP_db/bt_toxin/db | head'
|
||||
```
|
||||
external_dbs/bt_toxin/
|
||||
├── db/ # BLAST index files (required)
|
||||
│ ├── bt_toxin.phr
|
||||
│ ├── bt_toxin.pin
|
||||
│ ├── bt_toxin.psq
|
||||
│ └── ...
|
||||
└── seq/ # Source sequences (optional, for reference)
|
||||
└── bt_toxin*.fas
|
||||
```
|
||||
|
||||
输出应显示 `.pin/.psq/.phr` 等文件,且时间戳/大小与宿主机一致,说明绑定成功。
|
||||
## Input File Format
|
||||
|
||||
### 使用外部数据库运行 Pipeline
|
||||
`.fna` files are FASTA-format nucleotide sequence files containing bacterial genome sequences:
|
||||
|
||||
脚本会自动检测 `external_dbs/bt_toxin` 目录,若存在则自动绑定:
|
||||
|
||||
```bash
|
||||
# 自动使用 external_dbs/bt_toxin(推荐)
|
||||
uv run python scripts/run_single_fna_pipeline.py --fna tests/test_data/HAN055.fna
|
||||
|
||||
# 或手动指定数据库路径
|
||||
uv run python scripts/run_single_fna_pipeline.py \
|
||||
--fna tests/test_data/HAN055.fna \
|
||||
--bttoxin_db_dir /path/to/custom/bt_toxin
|
||||
```
|
||||
>NZ_CP010088.1 Bacillus thuringiensis strain 97-27 chromosome, complete genome
|
||||
TAATGTAACACCAGTAAATATTTCATTCATATATTCTTTTAACTGTATTTTATATTCTTTCTACTCTACAATTTCTTTTA
|
||||
ACTGCCAATATGCATCTTCTAGCCAAGGGTGTAAAACTTTCAACGTGTCTTTTCTATCCCACAAATATGAAATATATGCA
|
||||
...
|
||||
```
|
||||
|
||||
### 注意事项
|
||||
## Result Interpretation
|
||||
|
||||
- **db/ 目录是必需的**:运行时 BLAST 只读取 `db/` 下的索引文件
|
||||
- **seq/ 目录是可选的**:仅用于留档或重新生成索引
|
||||
- **绑定模式为只读 (ro)**:防止容器意外修改宿主机数据库
|
||||
- **不需要重新 index**:GitHub 仓库已包含预构建的 BLAST 索引
|
||||
|
||||
### Key Output Files
|
||||
|
||||
**All_Toxins.txt** - Complete toxin predictions with:
|
||||
- Strain, Protein ID, coordinates
|
||||
- SVM/BLAST/HMM predictions
|
||||
- Hit ID, alignment length, identity, E-value
|
||||
|
||||
**strain_target_scores.tsv** - Strain-level target predictions:
|
||||
- TopOrder: Most likely target insect order
|
||||
- TopScore: Confidence score (0-1)
|
||||
- Per-order scores for all target orders
|
||||
|
||||
**toxin_support.tsv** - Per-hit contribution details:
|
||||
- Individual toxin weights and contributions
|
||||
- Family classification and partner status
|
||||
|
||||
### Toxin Rankings
|
||||
|
||||
- **Rank1**: Highest confidence (identity ≥78%, coverage ≥80%)
|
||||
- **Rank2-3**: Moderate confidence
|
||||
- **Rank4**:
|
||||
Lowest confidence predictions
|
||||
|
||||
### Target Orders
|
||||
|
||||
Common insect orders in predictions:
|
||||
- **Lepidoptera**: Moths and butterflies
|
||||
- **Coleoptera**: Beetles
|
||||
- **Diptera**: Flies and mosquitoes
|
||||
- **Hemiptera**: True bugs
|
||||
- **Nematoda**: Roundworms
|
||||
|
||||
## Development
|
||||
|
||||
### Python Development Environment
|
||||
|
||||
For development work outside pixi:
|
||||
|
||||
```bash
|
||||
uv venv --managed-python -p 3.12 --seed .venv
|
||||
source .venv/bin/activate
|
||||
uv pip install -e .
|
||||
```
|
||||
|
||||
### Running Tests
|
||||
|
||||
```bash
|
||||
# Run property-based tests
|
||||
pixi run -e pipeline python -m pytest tests/test_pixi_runner.py -v
|
||||
```
|
||||
|
||||
### Project Structure
|
||||
|
||||
```
|
||||
bttoxin-pipeline/
|
||||
├── pixi.toml # Pixi environment configuration
|
||||
├── pyproject.toml # Python package configuration
|
||||
├── scripts/ # Core pipeline scripts
|
||||
│ ├── run_single_fna_pipeline.py # Main pipeline orchestrator
|
||||
│ ├── run_digger_stage.py # Digger-only stage
|
||||
│ ├── bttoxin_shoter.py # Toxin scoring module
|
||||
│ ├── plot_shotter.py # Visualization & reporting
|
||||
│ └── pixi_runner.py # PixiRunner class
|
||||
├── bttoxin/ # Python package (CLI entry point)
|
||||
│ ├── __init__.py
|
||||
│ ├── api.py
|
||||
│ └── cli.py
|
||||
├── Data/ # Reference data
|
||||
│ └── toxicity-data.csv # BPPRC specificity data
|
||||
├── external_dbs/ # External databases (optional)
|
||||
│ └── bt_toxin/ # Updated BtToxin database
|
||||
├── tests/ # Test suite
|
||||
│ ├── test_pixi_runner.py # Property-based tests
|
||||
│ └── test_data/ # Test input files
|
||||
├── docs/ # Documentation
|
||||
├── runs/ # Pipeline outputs (gitignored)
|
||||
├── backend/ # FastAPI backend (optional web service)
|
||||
└── frontend/ # Vue.js frontend (optional web UI)
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### pixi not found
|
||||
|
||||
```bash
|
||||
# Ensure pixi is in PATH
|
||||
export PATH="$HOME/.pixi/bin:$PATH"
|
||||
|
||||
# Or reinstall
|
||||
curl -fsSL https://pixi.sh/install.sh | bash
|
||||
```
|
||||
|
||||
### Environment not found
|
||||
|
||||
```bash
|
||||
# Reinstall environments
|
||||
pixi install
|
||||
```
|
||||
|
||||
### BtToxin_Digger not available
|
||||
|
||||
```bash
|
||||
# Verify digger environment
|
||||
pixi run -e digger BtToxin_Digger --help
|
||||
```
|
||||
|
||||
### Permission errors
|
||||
|
||||
Ensure write permissions on output directories. The pipeline creates directories automatically.
|
||||
|
||||
## License
|
||||
|
||||
|
||||
@@ -1,4 +1,10 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Bttoxin Pipeline API (pixi-based).
|
||||
|
||||
This module provides the API for running the BtToxin pipeline using pixi environments:
|
||||
- digger environment: BtToxin_Digger with bioconda dependencies
|
||||
- pipeline environment: Python analysis with pandas/matplotlib/seaborn
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
@@ -10,16 +16,15 @@ from types import SimpleNamespace
|
||||
from typing import Dict, Any, Optional
|
||||
import sys as _sys
|
||||
|
||||
# Ensure repo-relative imports for backend and scripts when running from installed package
|
||||
# Ensure repo-relative imports for scripts when running from installed package
|
||||
_REPO_ROOT = Path(__file__).resolve().parents[1]
|
||||
_BACKEND_DIR = _REPO_ROOT / "backend"
|
||||
_SCRIPTS_DIR = _REPO_ROOT / "scripts"
|
||||
for _p in (str(_BACKEND_DIR), str(_SCRIPTS_DIR)):
|
||||
for _p in (str(_SCRIPTS_DIR),):
|
||||
if _p not in _sys.path:
|
||||
_sys.path.append(_p)
|
||||
|
||||
# Import DockerContainerManager from backend
|
||||
from app.utils.docker_client import DockerContainerManager # type: ignore
|
||||
# Import PixiRunner from scripts
|
||||
from pixi_runner import PixiRunner # type: ignore
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -45,21 +50,19 @@ def _lazy_import_plotter():
|
||||
|
||||
|
||||
class BtToxinRunner:
|
||||
"""Wrap BtToxin_Digger docker invocation for a single FNA."""
|
||||
"""Wrap BtToxin_Digger pixi invocation for a single FNA."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
image: str = "quay.io/biocontainers/bttoxin_digger:1.0.10--hdfd78af_0",
|
||||
platform: str = "linux/amd64",
|
||||
base_workdir: Optional[Path] = None,
|
||||
bttoxin_db_dir: Optional[Path] = None,
|
||||
) -> None:
|
||||
self.image = image
|
||||
self.platform = platform
|
||||
if base_workdir is None:
|
||||
base_workdir = _REPO_ROOT / "runs" / "bttoxin"
|
||||
self.base_workdir = base_workdir
|
||||
self.base_workdir.mkdir(parents=True, exist_ok=True)
|
||||
self.mgr = DockerContainerManager(image=self.image, platform=self.platform)
|
||||
self.bttoxin_db_dir = bttoxin_db_dir
|
||||
self.runner = PixiRunner(pixi_project_dir=_REPO_ROOT, env_name="digger")
|
||||
|
||||
def _prepare_layout(self, fna_path: Path) -> tuple[Path, Path, Path, Path, str]:
|
||||
if not fna_path.exists():
|
||||
@@ -86,13 +89,14 @@ class BtToxinRunner:
|
||||
fna_path = Path(fna_path)
|
||||
input_dir, digger_out, log_dir, run_root, sample_name = self._prepare_layout(fna_path)
|
||||
logger.info("Start BtToxin_Digger: %s (sample=%s)", fna_path, sample_name)
|
||||
result = self.mgr.run_bttoxin_digger(
|
||||
result = self.runner.run_bttoxin_digger(
|
||||
input_dir=input_dir,
|
||||
output_dir=digger_out,
|
||||
log_dir=log_dir,
|
||||
sequence_type=sequence_type,
|
||||
scaf_suffix=fna_path.suffix or ".fna",
|
||||
threads=threads,
|
||||
bttoxin_db_dir=self.bttoxin_db_dir,
|
||||
)
|
||||
toxins_dir = digger_out / "Results" / "Toxins"
|
||||
files = {
|
||||
@@ -234,15 +238,13 @@ class PlotAPI:
|
||||
|
||||
|
||||
class BtSingleFnaPipeline:
|
||||
"""End-to-end single-FNA pipeline: Digger → Shotter → Plot → Bundle."""
|
||||
"""End-to-end single-FNA pipeline: Digger → Shotter → Plot → Bundle (pixi-based)."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
image: str = "quay.io/biocontainers/bttoxin_digger:1.0.10--hdfd78af_0",
|
||||
platform: str = "linux/amd64",
|
||||
base_workdir: Optional[Path] = None,
|
||||
) -> None:
|
||||
self.digger = BtToxinRunner(image=image, platform=platform, base_workdir=base_workdir)
|
||||
self.base_workdir = base_workdir
|
||||
self.shotter = ShotterAPI()
|
||||
self.plotter = PlotAPI()
|
||||
|
||||
@@ -256,8 +258,11 @@ class BtSingleFnaPipeline:
|
||||
require_index_hit: bool = False,
|
||||
lang: str = "zh",
|
||||
threads: int = 4,
|
||||
bttoxin_db_dir: Optional[Path] = None,
|
||||
) -> Dict[str, Any]:
|
||||
dig = self.digger.run_single_fna(fna_path=fna, sequence_type="nucl", threads=threads)
|
||||
# Create digger runner with optional external database
|
||||
digger = BtToxinRunner(base_workdir=self.base_workdir, bttoxin_db_dir=bttoxin_db_dir)
|
||||
dig = digger.run_single_fna(fna_path=fna, sequence_type="nucl", threads=threads)
|
||||
if not dig.get("success"):
|
||||
return {"ok": False, "stage": "digger", "detail": dig}
|
||||
run_root: Path = dig["run_root"]
|
||||
|
||||
@@ -1,4 +1,16 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Bttoxin single-FNA pipeline CLI (pixi-based).
|
||||
|
||||
This CLI uses pixi environments for execution:
|
||||
- digger environment: BtToxin_Digger with bioconda dependencies
|
||||
- pipeline environment: Python analysis with pandas/matplotlib/seaborn
|
||||
|
||||
Example:
|
||||
python -m bttoxin.cli --fna tests/test_data/HAN055.fna --lang zh
|
||||
|
||||
# With custom database
|
||||
python -m bttoxin.cli --fna tests/test_data/HAN055.fna --bttoxin_db_dir /path/to/bt_toxin
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
@@ -11,8 +23,8 @@ def main() -> int:
|
||||
ap.add_argument("--fna", type=Path, required=True, help="Path to a single .fna file")
|
||||
ap.add_argument("--toxicity_csv", type=Path, default=Path("Data/toxicity-data.csv"))
|
||||
ap.add_argument("--base_workdir", type=Path, default=None, help="Base working dir (default: runs/bttoxin under repo root)")
|
||||
ap.add_argument("--image", type=str, default="quay.io/biocontainers/bttoxin_digger:1.0.10--hdfd78af_0")
|
||||
ap.add_argument("--platform", type=str, default="linux/amd64")
|
||||
ap.add_argument("--bttoxin_db_dir", type=Path, default=None,
|
||||
help="外部 bt_toxin 数据库目录路径(默认自动检测 external_dbs/bt_toxin)")
|
||||
ap.add_argument("--min_identity", type=float, default=0.0)
|
||||
ap.add_argument("--min_coverage", type=float, default=0.0)
|
||||
ap.add_argument("--disallow_unknown_families", action="store_true", default=False)
|
||||
@@ -21,7 +33,7 @@ def main() -> int:
|
||||
ap.add_argument("--threads", type=int, default=4)
|
||||
args = ap.parse_args()
|
||||
|
||||
pipe = BtSingleFnaPipeline(image=args.image, platform=args.platform, base_workdir=args.base_workdir)
|
||||
pipe = BtSingleFnaPipeline(base_workdir=args.base_workdir)
|
||||
res = pipe.run(
|
||||
fna=args.fna,
|
||||
toxicity_csv=args.toxicity_csv,
|
||||
@@ -31,6 +43,7 @@ def main() -> int:
|
||||
require_index_hit=args.require_index_hit,
|
||||
lang=args.lang,
|
||||
threads=args.threads,
|
||||
bttoxin_db_dir=args.bttoxin_db_dir,
|
||||
)
|
||||
|
||||
if not res.get("ok"):
|
||||
|
||||
75305
tests/test_data/HAN055.fna
Normal file
75305
tests/test_data/HAN055.fna
Normal file
File diff suppressed because it is too large
Load Diff
476
tests/test_pixi_runner.py
Normal file
476
tests/test_pixi_runner.py
Normal file
@@ -0,0 +1,476 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Property-based tests for PixiRunner.
|
||||
|
||||
Uses hypothesis library for property-based testing as specified in the design document.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import sys
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
import pytest
|
||||
from hypothesis import given, settings, HealthCheck, strategies as st
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parents[1] / "scripts"))
|
||||
from pixi_runner import PixiRunner, PixiRunnerError
|
||||
|
||||
|
||||
def create_mock_pixi_toml(tmp_path: Path) -> Path:
|
||||
"""Create a mock pixi.toml file for testing."""
|
||||
pixi_toml = tmp_path / "pixi.toml"
|
||||
pixi_toml.write_text('[workspace]\nname = "test"\n[environments]\ndigger = ["digger"]\n')
|
||||
return tmp_path
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def mock_pixi_toml(tmp_path: Path) -> Path:
|
||||
"""Create a mock pixi.toml file for testing."""
|
||||
return create_mock_pixi_toml(tmp_path)
|
||||
|
||||
|
||||
sequence_types = st.sampled_from(["nucl", "orfs", "prot"])
|
||||
file_suffixes = st.sampled_from([".fna", ".fasta", ".fa", ".ffn", ".faa"])
|
||||
thread_counts = st.integers(min_value=1, max_value=64)
|
||||
|
||||
|
||||
class TestDiggerCommandConstruction:
|
||||
"""
|
||||
**Feature: pixi-conda-migration, Property 1: Digger command construction correctness**
|
||||
**Validates: Requirements 2.2**
|
||||
"""
|
||||
|
||||
@given(sequence_type=sequence_types, scaf_suffix=file_suffixes, threads=thread_counts)
|
||||
@settings(max_examples=100, suppress_health_check=[HealthCheck.function_scoped_fixture])
|
||||
def test_command_starts_with_pixi_run(self, mock_pixi_toml, sequence_type, scaf_suffix, threads):
|
||||
"""Command SHALL start with 'pixi run -e digger BtToxin_Digger'."""
|
||||
runner = PixiRunner(pixi_project_dir=mock_pixi_toml, env_name="digger")
|
||||
cmd = runner.build_digger_command(Path("/test"), sequence_type, scaf_suffix, threads)
|
||||
assert cmd[:5] == ["pixi", "run", "-e", "digger", "BtToxin_Digger"]
|
||||
|
||||
@given(sequence_type=sequence_types, scaf_suffix=file_suffixes, threads=thread_counts)
|
||||
@settings(max_examples=100, suppress_health_check=[HealthCheck.function_scoped_fixture])
|
||||
def test_command_contains_required_arguments(self, mock_pixi_toml, sequence_type, scaf_suffix, threads):
|
||||
"""Command SHALL contain --SeqPath, --SequenceType, and --threads."""
|
||||
runner = PixiRunner(pixi_project_dir=mock_pixi_toml, env_name="digger")
|
||||
cmd = runner.build_digger_command(Path("/test"), sequence_type, scaf_suffix, threads)
|
||||
assert "--SeqPath" in cmd and "--SequenceType" in cmd and "--threads" in cmd
|
||||
|
||||
@given(sequence_type=sequence_types, scaf_suffix=file_suffixes, threads=thread_counts)
|
||||
@settings(max_examples=100, suppress_health_check=[HealthCheck.function_scoped_fixture])
|
||||
def test_command_contains_scaf_suffix_for_nucl(self, mock_pixi_toml, sequence_type, scaf_suffix, threads):
|
||||
"""Command SHALL contain --Scaf_suffix when sequence_type is 'nucl'."""
|
||||
runner = PixiRunner(pixi_project_dir=mock_pixi_toml, env_name="digger")
|
||||
cmd = runner.build_digger_command(Path("/test"), sequence_type, scaf_suffix, threads)
|
||||
if sequence_type == "nucl":
|
||||
assert "--Scaf_suffix" in cmd
|
||||
idx = cmd.index("--Scaf_suffix")
|
||||
assert cmd[idx + 1] == scaf_suffix
|
||||
|
||||
|
||||
class TestResultDictionaryCompleteness:
|
||||
"""
|
||||
**Feature: pixi-conda-migration, Property 2: Result dictionary completeness**
|
||||
**Validates: Requirements 2.3**
|
||||
"""
|
||||
|
||||
@given(exit_code=st.integers(-128, 255), stdout=st.text(max_size=50), stderr=st.text(max_size=50))
|
||||
@settings(max_examples=100, suppress_health_check=[HealthCheck.function_scoped_fixture])
|
||||
def test_result_contains_required_keys(self, mock_pixi_toml, exit_code, stdout, stderr):
|
||||
"""Result SHALL contain 'success', 'exit_code', 'logs', and 'status'."""
|
||||
runner = PixiRunner(pixi_project_dir=mock_pixi_toml, env_name="digger")
|
||||
mock_result = MagicMock(returncode=exit_code, stdout=stdout, stderr=stderr)
|
||||
with patch("pixi_runner.subprocess.run", return_value=mock_result):
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
inp, out, log = Path(tmpdir)/"input", Path(tmpdir)/"output", Path(tmpdir)/"logs"
|
||||
inp.mkdir()
|
||||
(inp / "test.fna").write_text(">s\nA\n")
|
||||
result = runner.run_bttoxin_digger(inp, out, log)
|
||||
assert all(k in result for k in ["success", "exit_code", "logs", "status"])
|
||||
assert isinstance(result["success"], bool) and isinstance(result["exit_code"], int)
|
||||
|
||||
|
||||
class TestFailureHandling:
|
||||
"""
|
||||
**Feature: pixi-conda-migration, Property 3: Failure status on non-zero exit**
|
||||
**Validates: Requirements 2.4**
|
||||
"""
|
||||
|
||||
@given(exit_code=st.integers(1, 255))
|
||||
@settings(max_examples=100, suppress_health_check=[HealthCheck.function_scoped_fixture])
|
||||
def test_nonzero_exit_returns_failure(self, mock_pixi_toml, exit_code):
|
||||
"""Non-zero exit SHALL return success=False and status='failed'."""
|
||||
runner = PixiRunner(pixi_project_dir=mock_pixi_toml, env_name="digger")
|
||||
mock_result = MagicMock(returncode=exit_code, stdout="", stderr="Error")
|
||||
with patch("pixi_runner.subprocess.run", return_value=mock_result):
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
inp, out, log = Path(tmpdir)/"input", Path(tmpdir)/"output", Path(tmpdir)/"logs"
|
||||
inp.mkdir()
|
||||
(inp / "test.fna").write_text(">s\nA\n")
|
||||
result = runner.run_bttoxin_digger(inp, out, log)
|
||||
assert result["success"] is False and result["status"] == "failed"
|
||||
|
||||
|
||||
class TestErrorMessageGuidance:
|
||||
"""
|
||||
**Feature: pixi-conda-migration, Property 7: Error message contains actionable guidance**
|
||||
**Validates: Requirements 5.1, 5.2, 5.3, 5.4**
|
||||
"""
|
||||
|
||||
def test_pixi_not_installed_error_contains_guidance(self, mock_pixi_toml):
|
||||
"""Error message SHALL contain actionable instructions."""
|
||||
runner = PixiRunner(pixi_project_dir=mock_pixi_toml, env_name="digger")
|
||||
with patch("pixi_runner.subprocess.run", side_effect=FileNotFoundError("pixi")):
|
||||
result = runner.check_environment()
|
||||
assert result["error"] and any(k in result["error"].lower() for k in ["install", "pixi", "run"])
|
||||
|
||||
@given(error_type=st.sampled_from(["pixi_missing", "env_missing"]))
|
||||
@settings(max_examples=100, suppress_health_check=[HealthCheck.function_scoped_fixture])
|
||||
def test_all_errors_contain_guidance(self, mock_pixi_toml, error_type):
|
||||
"""All error types SHALL contain actionable guidance."""
|
||||
runner = PixiRunner(pixi_project_dir=mock_pixi_toml, env_name="nonexistent")
|
||||
def mock_run(cmd, **kw):
|
||||
m = MagicMock(returncode=0, stdout='{"environments_info":[]}', stderr="")
|
||||
if error_type == "pixi_missing":
|
||||
raise FileNotFoundError()
|
||||
return m
|
||||
with patch("pixi_runner.subprocess.run", side_effect=mock_run):
|
||||
result = runner.check_environment()
|
||||
assert result["error"] and any(k in result["error"].lower() for k in ["install", "pixi", "run"])
|
||||
|
||||
|
||||
class TestShotterCommandConstruction:
|
||||
"""
|
||||
**Feature: pixi-conda-migration, Property 4: Shotter command uses pipeline environment**
|
||||
**Validates: Requirements 3.2**
|
||||
"""
|
||||
|
||||
@given(
|
||||
min_identity=st.floats(min_value=0.0, max_value=1.0, allow_nan=False),
|
||||
min_coverage=st.floats(min_value=0.0, max_value=1.0, allow_nan=False),
|
||||
allow_unknown=st.booleans(),
|
||||
require_index=st.booleans(),
|
||||
)
|
||||
@settings(max_examples=100, suppress_health_check=[HealthCheck.function_scoped_fixture])
|
||||
def test_shotter_command_starts_with_pixi_pipeline(
|
||||
self, mock_pixi_toml, min_identity, min_coverage, allow_unknown, require_index
|
||||
):
|
||||
"""Shotter command SHALL start with 'pixi run -e pipeline python'."""
|
||||
from pixi_runner import build_shotter_command
|
||||
|
||||
cmd = build_shotter_command(
|
||||
pixi_project_dir=mock_pixi_toml,
|
||||
script_path=Path("/scripts/bttoxin_shoter.py"),
|
||||
toxicity_csv=Path("/data/toxicity.csv"),
|
||||
all_toxins=Path("/output/All_Toxins.txt"),
|
||||
output_dir=Path("/output"),
|
||||
min_identity=min_identity,
|
||||
min_coverage=min_coverage,
|
||||
allow_unknown_families=allow_unknown,
|
||||
require_index_hit=require_index,
|
||||
)
|
||||
assert cmd[:4] == ["pixi", "run", "-e", "pipeline"]
|
||||
assert cmd[4] == "python"
|
||||
|
||||
@given(
|
||||
min_identity=st.floats(min_value=0.0, max_value=1.0, allow_nan=False),
|
||||
min_coverage=st.floats(min_value=0.0, max_value=1.0, allow_nan=False),
|
||||
)
|
||||
@settings(max_examples=100, suppress_health_check=[HealthCheck.function_scoped_fixture])
|
||||
def test_shotter_command_contains_script_path(
|
||||
self, mock_pixi_toml, min_identity, min_coverage
|
||||
):
|
||||
"""Shotter command SHALL include the bttoxin_shoter.py script path."""
|
||||
from pixi_runner import build_shotter_command
|
||||
|
||||
script = Path("/scripts/bttoxin_shoter.py")
|
||||
cmd = build_shotter_command(
|
||||
pixi_project_dir=mock_pixi_toml,
|
||||
script_path=script,
|
||||
toxicity_csv=Path("/data/toxicity.csv"),
|
||||
all_toxins=Path("/output/All_Toxins.txt"),
|
||||
output_dir=Path("/output"),
|
||||
min_identity=min_identity,
|
||||
min_coverage=min_coverage,
|
||||
)
|
||||
assert str(script) in cmd
|
||||
|
||||
|
||||
|
||||
class TestPlotCommandConstruction:
|
||||
"""
|
||||
**Feature: pixi-conda-migration, Property 5: Plot command uses pipeline environment**
|
||||
**Validates: Requirements 3.3**
|
||||
"""
|
||||
|
||||
@given(
|
||||
merge_unresolved=st.booleans(),
|
||||
report_mode=st.sampled_from(["paper", "summary"]),
|
||||
lang=st.sampled_from(["zh", "en"]),
|
||||
)
|
||||
@settings(max_examples=100, suppress_health_check=[HealthCheck.function_scoped_fixture])
|
||||
def test_plot_command_starts_with_pixi_pipeline(
|
||||
self, mock_pixi_toml, merge_unresolved, report_mode, lang
|
||||
):
|
||||
"""Plot command SHALL start with 'pixi run -e pipeline python'."""
|
||||
from pixi_runner import build_plot_command
|
||||
|
||||
cmd = build_plot_command(
|
||||
pixi_project_dir=mock_pixi_toml,
|
||||
script_path=Path("/scripts/plot_shotter.py"),
|
||||
strain_scores=Path("/output/strain_target_scores.tsv"),
|
||||
toxin_support=Path("/output/toxin_support.tsv"),
|
||||
species_scores=Path("/output/strain_target_species_scores.tsv"),
|
||||
out_dir=Path("/output"),
|
||||
merge_unresolved=merge_unresolved,
|
||||
report_mode=report_mode,
|
||||
lang=lang,
|
||||
)
|
||||
assert cmd[:4] == ["pixi", "run", "-e", "pipeline"]
|
||||
assert cmd[4] == "python"
|
||||
|
||||
@given(
|
||||
merge_unresolved=st.booleans(),
|
||||
report_mode=st.sampled_from(["paper", "summary"]),
|
||||
lang=st.sampled_from(["zh", "en"]),
|
||||
per_hit_strain=st.one_of(st.none(), st.text(min_size=1, max_size=20, alphabet=st.characters(whitelist_categories=('L', 'N')))),
|
||||
)
|
||||
@settings(max_examples=100, suppress_health_check=[HealthCheck.function_scoped_fixture])
|
||||
def test_plot_command_contains_script_path(
|
||||
self, mock_pixi_toml, merge_unresolved, report_mode, lang, per_hit_strain
|
||||
):
|
||||
"""Plot command SHALL include the plot_shotter.py script path."""
|
||||
from pixi_runner import build_plot_command
|
||||
|
||||
script = Path("/scripts/plot_shotter.py")
|
||||
cmd = build_plot_command(
|
||||
pixi_project_dir=mock_pixi_toml,
|
||||
script_path=script,
|
||||
strain_scores=Path("/output/strain_target_scores.tsv"),
|
||||
toxin_support=Path("/output/toxin_support.tsv"),
|
||||
species_scores=Path("/output/strain_target_species_scores.tsv"),
|
||||
out_dir=Path("/output"),
|
||||
merge_unresolved=merge_unresolved,
|
||||
report_mode=report_mode,
|
||||
lang=lang,
|
||||
per_hit_strain=per_hit_strain,
|
||||
)
|
||||
assert str(script) in cmd
|
||||
|
||||
|
||||
|
||||
class TestBundleCreation:
|
||||
"""
|
||||
**Feature: pixi-conda-migration, Property 6: Bundle creation correctness**
|
||||
**Validates: Requirements 3.5**
|
||||
"""
|
||||
|
||||
@given(
|
||||
digger_files=st.lists(st.text(min_size=1, max_size=20, alphabet=st.characters(whitelist_categories=('L', 'N'))), min_size=0, max_size=5),
|
||||
shotter_files=st.lists(st.text(min_size=1, max_size=20, alphabet=st.characters(whitelist_categories=('L', 'N'))), min_size=0, max_size=5),
|
||||
)
|
||||
@settings(max_examples=100, suppress_health_check=[HealthCheck.function_scoped_fixture], deadline=None)
|
||||
def test_bundle_contains_correct_arcnames(self, digger_files, shotter_files):
|
||||
"""Bundle SHALL contain directories with correct arcnames ('digger' and 'shotter')."""
|
||||
from pixi_runner import create_pipeline_bundle, verify_bundle_contents
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
tmp = Path(tmpdir)
|
||||
digger_dir = tmp / "digger_output"
|
||||
shotter_dir = tmp / "shotter_output"
|
||||
bundle_path = tmp / "test_bundle.tar.gz"
|
||||
|
||||
# Create directories with some files
|
||||
digger_dir.mkdir(parents=True, exist_ok=True)
|
||||
shotter_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
for f in digger_files:
|
||||
if f: # Skip empty strings
|
||||
(digger_dir / f"{f}.txt").write_text("test")
|
||||
for f in shotter_files:
|
||||
if f: # Skip empty strings
|
||||
(shotter_dir / f"{f}.txt").write_text("test")
|
||||
|
||||
# Create bundle
|
||||
success = create_pipeline_bundle(bundle_path, digger_dir, shotter_dir)
|
||||
assert success
|
||||
|
||||
# Verify contents
|
||||
result = verify_bundle_contents(bundle_path)
|
||||
|
||||
# Check arcnames
|
||||
if digger_files:
|
||||
assert result["has_digger"], "Bundle should contain 'digger' directory"
|
||||
assert any(m.startswith("digger/") or m == "digger" for m in result["members"])
|
||||
if shotter_files:
|
||||
assert result["has_shotter"], "Bundle should contain 'shotter' directory"
|
||||
assert any(m.startswith("shotter/") or m == "shotter" for m in result["members"])
|
||||
|
||||
@given(
|
||||
has_digger=st.booleans(),
|
||||
has_shotter=st.booleans(),
|
||||
)
|
||||
@settings(max_examples=100, suppress_health_check=[HealthCheck.function_scoped_fixture])
|
||||
def test_bundle_handles_missing_directories(self, has_digger, has_shotter):
|
||||
"""Bundle creation SHALL handle missing directories gracefully."""
|
||||
from pixi_runner import create_pipeline_bundle, verify_bundle_contents
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
tmp = Path(tmpdir)
|
||||
digger_dir = tmp / "digger_output"
|
||||
shotter_dir = tmp / "shotter_output"
|
||||
bundle_path = tmp / "test_bundle.tar.gz"
|
||||
|
||||
# Conditionally create directories
|
||||
if has_digger:
|
||||
digger_dir.mkdir(parents=True, exist_ok=True)
|
||||
(digger_dir / "test.txt").write_text("digger content")
|
||||
if has_shotter:
|
||||
shotter_dir.mkdir(parents=True, exist_ok=True)
|
||||
(shotter_dir / "test.txt").write_text("shotter content")
|
||||
|
||||
# Create bundle
|
||||
success = create_pipeline_bundle(bundle_path, digger_dir, shotter_dir)
|
||||
assert success
|
||||
|
||||
# Verify contents match what was created
|
||||
result = verify_bundle_contents(bundle_path)
|
||||
assert result["has_digger"] == has_digger
|
||||
assert result["has_shotter"] == has_shotter
|
||||
|
||||
|
||||
|
||||
class TestCLIArgumentPassthrough:
|
||||
"""
|
||||
**Feature: pixi-conda-migration, Property 8: CLI argument passthrough**
|
||||
**Validates: Requirements 4.3, 6.4**
|
||||
"""
|
||||
|
||||
@given(
|
||||
min_identity=st.floats(min_value=0.0, max_value=1.0, allow_nan=False),
|
||||
min_coverage=st.floats(min_value=0.0, max_value=1.0, allow_nan=False),
|
||||
allow_unknown=st.booleans(),
|
||||
require_index=st.booleans(),
|
||||
lang=st.sampled_from(["zh", "en"]),
|
||||
threads=st.integers(min_value=1, max_value=64),
|
||||
)
|
||||
@settings(max_examples=100, suppress_health_check=[HealthCheck.function_scoped_fixture])
|
||||
def test_shotter_args_passthrough(
|
||||
self, mock_pixi_toml, min_identity, min_coverage, allow_unknown, require_index, lang, threads
|
||||
):
|
||||
"""CLI arguments SHALL be correctly passed to shotter command without modification."""
|
||||
from pixi_runner import build_shotter_command
|
||||
|
||||
cmd = build_shotter_command(
|
||||
pixi_project_dir=mock_pixi_toml,
|
||||
script_path=Path("/scripts/bttoxin_shoter.py"),
|
||||
toxicity_csv=Path("/data/toxicity.csv"),
|
||||
all_toxins=Path("/output/All_Toxins.txt"),
|
||||
output_dir=Path("/output"),
|
||||
min_identity=min_identity,
|
||||
min_coverage=min_coverage,
|
||||
allow_unknown_families=allow_unknown,
|
||||
require_index_hit=require_index,
|
||||
)
|
||||
|
||||
# Verify min_identity is passed correctly
|
||||
if min_identity > 0:
|
||||
assert "--min_identity" in cmd
|
||||
idx = cmd.index("--min_identity")
|
||||
assert float(cmd[idx + 1]) == min_identity
|
||||
|
||||
# Verify min_coverage is passed correctly
|
||||
if min_coverage > 0:
|
||||
assert "--min_coverage" in cmd
|
||||
idx = cmd.index("--min_coverage")
|
||||
assert float(cmd[idx + 1]) == min_coverage
|
||||
|
||||
# Verify allow_unknown_families flag
|
||||
if not allow_unknown:
|
||||
assert "--disallow_unknown_families" in cmd
|
||||
else:
|
||||
assert "--disallow_unknown_families" not in cmd
|
||||
|
||||
# Verify require_index_hit flag
|
||||
if require_index:
|
||||
assert "--require_index_hit" in cmd
|
||||
else:
|
||||
assert "--require_index_hit" not in cmd
|
||||
|
||||
@given(
|
||||
merge_unresolved=st.booleans(),
|
||||
report_mode=st.sampled_from(["paper", "summary"]),
|
||||
lang=st.sampled_from(["zh", "en"]),
|
||||
per_hit_strain=st.one_of(st.none(), st.text(min_size=1, max_size=20, alphabet=st.characters(whitelist_categories=('L', 'N')))),
|
||||
)
|
||||
@settings(max_examples=100, suppress_health_check=[HealthCheck.function_scoped_fixture])
|
||||
def test_plot_args_passthrough(
|
||||
self, mock_pixi_toml, merge_unresolved, report_mode, lang, per_hit_strain
|
||||
):
|
||||
"""CLI arguments SHALL be correctly passed to plot command without modification."""
|
||||
from pixi_runner import build_plot_command
|
||||
|
||||
cmd = build_plot_command(
|
||||
pixi_project_dir=mock_pixi_toml,
|
||||
script_path=Path("/scripts/plot_shotter.py"),
|
||||
strain_scores=Path("/output/strain_target_scores.tsv"),
|
||||
toxin_support=Path("/output/toxin_support.tsv"),
|
||||
species_scores=Path("/output/strain_target_species_scores.tsv"),
|
||||
out_dir=Path("/output"),
|
||||
merge_unresolved=merge_unresolved,
|
||||
report_mode=report_mode,
|
||||
lang=lang,
|
||||
per_hit_strain=per_hit_strain,
|
||||
)
|
||||
|
||||
# Verify merge_unresolved flag
|
||||
if merge_unresolved:
|
||||
assert "--merge_unresolved" in cmd
|
||||
else:
|
||||
assert "--merge_unresolved" not in cmd
|
||||
|
||||
# Verify report_mode is passed correctly
|
||||
assert "--report_mode" in cmd
|
||||
idx = cmd.index("--report_mode")
|
||||
assert cmd[idx + 1] == report_mode
|
||||
|
||||
# Verify lang is passed correctly
|
||||
assert "--lang" in cmd
|
||||
idx = cmd.index("--lang")
|
||||
assert cmd[idx + 1] == lang
|
||||
|
||||
# Verify per_hit_strain is passed correctly when provided
|
||||
if per_hit_strain:
|
||||
assert "--per_hit_strain" in cmd
|
||||
idx = cmd.index("--per_hit_strain")
|
||||
assert cmd[idx + 1] == per_hit_strain
|
||||
|
||||
@given(
|
||||
sequence_type=sequence_types,
|
||||
scaf_suffix=file_suffixes,
|
||||
threads=thread_counts,
|
||||
)
|
||||
@settings(max_examples=100, suppress_health_check=[HealthCheck.function_scoped_fixture])
|
||||
def test_digger_args_passthrough(
|
||||
self, mock_pixi_toml, sequence_type, scaf_suffix, threads
|
||||
):
|
||||
"""CLI arguments SHALL be correctly passed to digger command without modification."""
|
||||
runner = PixiRunner(pixi_project_dir=mock_pixi_toml, env_name="digger")
|
||||
cmd = runner.build_digger_command(Path("/test"), sequence_type, scaf_suffix, threads)
|
||||
|
||||
# Verify sequence_type is passed correctly
|
||||
assert "--SequenceType" in cmd
|
||||
idx = cmd.index("--SequenceType")
|
||||
assert cmd[idx + 1] == sequence_type
|
||||
|
||||
# Verify threads is passed correctly
|
||||
assert "--threads" in cmd
|
||||
idx = cmd.index("--threads")
|
||||
assert int(cmd[idx + 1]) == threads
|
||||
|
||||
# Verify scaf_suffix is passed correctly for nucl type
|
||||
if sequence_type == "nucl":
|
||||
assert "--Scaf_suffix" in cmd
|
||||
idx = cmd.index("--Scaf_suffix")
|
||||
assert cmd[idx + 1] == scaf_suffix
|
||||
Reference in New Issue
Block a user