Files
bttoxin-pipeline/tools/crispr_cas_analysis/README.md
zly 8e0deb1691 feat: add base_prokka tool and CRISPR-Cas analysis source code
- Add base_prokka genome annotation tool with pixi config
- Add CRISPR-Cas analysis src (CRISPRCasFinder.pl, environment config)
- Add test data and documentation

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-28 20:31:00 +08:00

81 lines
2.3 KiB
Markdown

# CRISPR-Cas Analysis Module
This module provides tools for detecting and analyzing CRISPR-Cas systems in bacterial genomes using CRISPRCasFinder and MacSyFinder.
## Installation & Setup
This directory is a standalone `pixi` project.
1. **Enter the directory**:
```bash
cd tools/crispr_cas_analysis
```
2. **Install dependencies**:
```bash
pixi install
```
3. **Install CASFinder Definitions**:
This step downloads the required CASFinder model definitions.
```bash
pixi run install-casfinder
```
## Usage
### Environment
To run commands, you can either prepend `pixi run` or enter the shell:
```bash
pixi shell
```
### Running Detection
Use the provided `CRISPRCasFinder.pl` script to analyze a genome assembly (FASTA format).
**Example Command (running from `tools/crispr_cas_analysis` directory)**:
```bash
# 1. Clean up previous results if they exist
rm -rf tests/test_output
# 先创建输出目录(如果不存在)
mkdir -p ./tests/test_output
# 进入输出目录
cd ./tests/test_output
# 从这里运行命令,调整相关路径
pixi run perl ../../src/CRISPRCasFinder.pl \
-in ../20141126CLLT035_contig341.fna \
-out . \
-so ../../src/sel392v2.so \
-cas -q -log
# # 2. Run detection using relative paths
pixi run perl src/CRISPRCasFinder.pl \
-in ./tests/20141126CLLT035_contig341.fna \
-q -cas -log -html -ccvRep \
-cpuMacSyFinder 20 \
-cluster 20000 \
-getSummaryCasfinder \
-so /home/gzy/Bt_Project/software/sel392v2.so \
-gffAnnot /home/gzy/Bt_Project/1_sequencing_genome_annotation/20120412LHLT139/20120412LHLT139.gff \
-proteome /home/gzy/Bt_Project/1_sequencing_genome_annotation/20120412LHLT139/20120412LHLT139.faa
-out ./tests/test_output \
-so ./src/sel392v2.so
```
### Output Explanation
The output directory (`tests/test_output`) will contain several key files:
* `CRISPR-Cas_summary.tsv`: Summary of detected CRISPR arrays and Cas systems.
* `Cas_REPORT.tsv`: Detailed report of detected Cas proteins.
* `Crisprs_REPORT.tsv`: Detailed report of detected CRISPR arrays.
* `GFF/`: Annotations of the findings.
* `Visualization/`: HTML visualization of the results.
## Directory Structure
* `src/`: Source code and scripts (CRISPRCasFinder.pl, etc.).
* `scripts/`: Wrapper scripts for the pipeline.
* `tests/`: Test data.