feat: add base_prokka tool and CRISPR-Cas analysis source code
- Add base_prokka genome annotation tool with pixi config - Add CRISPR-Cas analysis src (CRISPRCasFinder.pl, environment config) - Add test data and documentation Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
80
tools/crispr_cas_analysis/README.md
Normal file
80
tools/crispr_cas_analysis/README.md
Normal file
@@ -0,0 +1,80 @@
|
||||
# CRISPR-Cas Analysis Module
|
||||
|
||||
This module provides tools for detecting and analyzing CRISPR-Cas systems in bacterial genomes using CRISPRCasFinder and MacSyFinder.
|
||||
|
||||
## Installation & Setup
|
||||
|
||||
This directory is a standalone `pixi` project.
|
||||
|
||||
1. **Enter the directory**:
|
||||
```bash
|
||||
cd tools/crispr_cas_analysis
|
||||
```
|
||||
|
||||
2. **Install dependencies**:
|
||||
```bash
|
||||
pixi install
|
||||
```
|
||||
|
||||
3. **Install CASFinder Definitions**:
|
||||
This step downloads the required CASFinder model definitions.
|
||||
```bash
|
||||
pixi run install-casfinder
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Environment
|
||||
To run commands, you can either prepend `pixi run` or enter the shell:
|
||||
|
||||
```bash
|
||||
pixi shell
|
||||
```
|
||||
|
||||
### Running Detection
|
||||
Use the provided `CRISPRCasFinder.pl` script to analyze a genome assembly (FASTA format).
|
||||
|
||||
**Example Command (running from `tools/crispr_cas_analysis` directory)**:
|
||||
|
||||
```bash
|
||||
# 1. Clean up previous results if they exist
|
||||
rm -rf tests/test_output
|
||||
# 先创建输出目录(如果不存在)
|
||||
mkdir -p ./tests/test_output
|
||||
|
||||
# 进入输出目录
|
||||
cd ./tests/test_output
|
||||
|
||||
# 从这里运行命令,调整相关路径
|
||||
pixi run perl ../../src/CRISPRCasFinder.pl \
|
||||
-in ../20141126CLLT035_contig341.fna \
|
||||
-out . \
|
||||
-so ../../src/sel392v2.so \
|
||||
-cas -q -log
|
||||
|
||||
# # 2. Run detection using relative paths
|
||||
pixi run perl src/CRISPRCasFinder.pl \
|
||||
-in ./tests/20141126CLLT035_contig341.fna \
|
||||
-q -cas -log -html -ccvRep \
|
||||
-cpuMacSyFinder 20 \
|
||||
-cluster 20000 \
|
||||
-getSummaryCasfinder \
|
||||
-so /home/gzy/Bt_Project/software/sel392v2.so \
|
||||
-gffAnnot /home/gzy/Bt_Project/1_sequencing_genome_annotation/20120412LHLT139/20120412LHLT139.gff \
|
||||
-proteome /home/gzy/Bt_Project/1_sequencing_genome_annotation/20120412LHLT139/20120412LHLT139.faa
|
||||
-out ./tests/test_output \
|
||||
-so ./src/sel392v2.so
|
||||
```
|
||||
|
||||
### Output Explanation
|
||||
The output directory (`tests/test_output`) will contain several key files:
|
||||
* `CRISPR-Cas_summary.tsv`: Summary of detected CRISPR arrays and Cas systems.
|
||||
* `Cas_REPORT.tsv`: Detailed report of detected Cas proteins.
|
||||
* `Crisprs_REPORT.tsv`: Detailed report of detected CRISPR arrays.
|
||||
* `GFF/`: Annotations of the findings.
|
||||
* `Visualization/`: HTML visualization of the results.
|
||||
|
||||
## Directory Structure
|
||||
* `src/`: Source code and scripts (CRISPRCasFinder.pl, etc.).
|
||||
* `scripts/`: Wrapper scripts for the pipeline.
|
||||
* `tests/`: Test data.
|
||||
1772
tools/crispr_cas_analysis/pixi.lock
Normal file
1772
tools/crispr_cas_analysis/pixi.lock
Normal file
File diff suppressed because it is too large
Load Diff
1
tools/crispr_cas_analysis/result.json
Normal file
1
tools/crispr_cas_analysis/result.json
Normal file
@@ -0,0 +1 @@
|
||||
{
|
||||
85
tools/crispr_cas_analysis/src/02_crispr_cas_completeness.md
Normal file
85
tools/crispr_cas_analysis/src/02_crispr_cas_completeness.md
Normal file
@@ -0,0 +1,85 @@
|
||||
# 模块 2:CRISPR/Cas 完整性压分(模板)
|
||||
|
||||
> 目标:对每个基因组评估 CRISPR/Cas 系统状态,并映射为:
|
||||
> **不存在(0) > 不完整(1) > 完整(2)**,且 **越完整越压分**(负向、单调)。
|
||||
|
||||
---
|
||||
|
||||
## 1) Conda 环境
|
||||
|
||||
- 环境名:`crisprcasfinder`
|
||||
- 说明:用于运行 CRISPRCasFinders 检测工具
|
||||
- 激活:
|
||||
```bash
|
||||
conda env create -f ccf.environment.yml -n crisprcasfinder
|
||||
conda activate crisprcasfinder
|
||||
conda install -c bioconda macsyfinder=2.1.2
|
||||
macsydata install -u CASFinder==3.1.0
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2) 启动命令
|
||||
|
||||
### 2.1 单基因组运行(示例模板:以“检测 + 完整性判定”两步为例)
|
||||
```bash
|
||||
# Step 1) 运行crisprcasfinder
|
||||
perl ./script/CRISPRCasFinder.pl \
|
||||
-in /home/gzy/Bt_Project/1_sequencing_genome_annotation/20120412LHLT139/20120412LHLT139.fna \
|
||||
-q -cas -log -html -ccvRep \ # 参数保持默认
|
||||
-cpuMacSyFinder 20 \
|
||||
-cluster 20000 \
|
||||
-getSummaryCasfinder \
|
||||
-so /home/gzy/Bt_Project/software/sel392v2.so \
|
||||
-gffAnnot /home/gzy/Bt_Project/1_sequencing_genome_annotation/20120412LHLT139/20120412LHLT139.gff \
|
||||
-proteome /home/gzy/Bt_Project/1_sequencing_genome_annotation/20120412LHLT139/20120412LHLT139.faa
|
||||
|
||||
# Step 2) 将检测结果归类为 0/1/2(不存在/不完整/完整)
|
||||
python crispr_cas_stats.py \
|
||||
--input <OUTDIR>/CRISPR-Cas_summary.tsv \
|
||||
--output <OUTDIR>/CRISPR-Cas_statistics.tsv
|
||||
```
|
||||
|
||||
|
||||
---
|
||||
|
||||
## 3) 参数说明(阈值含义 + 默认值)
|
||||
|
||||
| 参数 | 含义 | 默认值 |
|
||||
|---|---|---|
|
||||
| `-in` | 基因组fna文件 |
|
||||
| `-cpuMacSyFinder` | 线程数 | 1 |
|
||||
| `-cluster` | 距离阈值 | 20000 |
|
||||
| `-getSummaryCasfinder` | |
|
||||
| `-so` | | ./sel392v2.so |
|
||||
| `-gffAnnot` | 基因组gff文件 | 来自prokka软件注释的gff文件 |
|
||||
| `-proteome` | 基因组faa文件 | 来自prokka软件注释的faa文件 |
|
||||
|
||||
---
|
||||
|
||||
## 4) 输出结果文件(结构与解析)
|
||||
|
||||
### 4.1 CRISPRCasFinder输出目录结构
|
||||
```
|
||||
OUTDIR/
|
||||
LOGs/
|
||||
TSV/
|
||||
CRISPR-Cas_summary.tsv
|
||||
CRISPR-Cas_clusters.tsv
|
||||
Crisprs_REPORT.tsv
|
||||
Cas_REPORT.tsv
|
||||
Visualization/
|
||||
index.html
|
||||
```
|
||||
|
||||
### 4.2 crispr_cas_stats.py结果文件结构
|
||||
|
||||
| 字段 | 类型 | 说明 |
|
||||
|---|---|---|
|
||||
| `state` | int | 0: 不存在 CRISPR/Cas系统; 1: CRISPR/Cas系统存在,但不完整; 2: CRISPR/Cas系统存在且完整 |
|
||||
| `typess` | string | 推断系统类型(I/II/III/V/…) |
|
||||
|
||||
---
|
||||
|
||||
|
||||
|
||||
5405
tools/crispr_cas_analysis/src/CRISPRCasFinder.pl
Normal file
5405
tools/crispr_cas_analysis/src/CRISPRCasFinder.pl
Normal file
File diff suppressed because it is too large
Load Diff
25
tools/crispr_cas_analysis/src/ccf.environment.yml
Normal file
25
tools/crispr_cas_analysis/src/ccf.environment.yml
Normal file
@@ -0,0 +1,25 @@
|
||||
name: crisprcasfinder
|
||||
channels:
|
||||
- conda-forge
|
||||
- bioconda
|
||||
- defaults
|
||||
dependencies:
|
||||
- python >=3.10
|
||||
- wget
|
||||
- curl
|
||||
- git
|
||||
- java-jdk
|
||||
- parallel
|
||||
- perl-app-cpanminus
|
||||
- hmmer
|
||||
- emboss
|
||||
- blast
|
||||
- perl-bioperl-core
|
||||
- perl-xml-simple
|
||||
- perl-digest-md5
|
||||
- vmatch
|
||||
- muscle
|
||||
- prodigal
|
||||
- mamba
|
||||
- macsyfinder=2.0
|
||||
prefix: crisprcasfinder
|
||||
@@ -0,0 +1,15 @@
|
||||
>20141126CLLT035_contig341
|
||||
AGAAGGATTTTAAAACCGTAAGACACTTAGAGAGGGGAAACAACTATGTCACTTTTACAG
|
||||
CAACATTTTGAAGAAAGAAGAGAATACATTTTCAATCGTCTTAAACAACCAGAATACATG
|
||||
GAAAGAAGCATAGAAAAAGTTCGCCAAGCTCAAAAAGAGATCAAAAATACAGTGCGAACG
|
||||
ATTAAAGATTTGTTACTCTTAGACAAAACCACTGATCCTTGCCTTTAATTTATTCACTAA
|
||||
TATTACACTGTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
|
||||
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
|
||||
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTACTGTTTTTATCATGATGTTTAAT
|
||||
GCAAAAGAGAAAATCCTCATGGATTCTTATAAGAAAAAGCGTAGATCACAAACAGAACTT
|
||||
CATTATGATGTTGCTGACAAAGAAGGGTTTGACAAAGCGTTTTATGAAGCGCGTATTGAT
|
||||
TCATTACGAAATGACATTCGTGTAATATCTTTCAAAAAGCTATGTGAAAATGAACCCGCA
|
||||
CCAGAAGACTTAGAACTATTCAAACAACGCTATGAAACAATTGTTTTACCAAAAATACAA
|
||||
GAAATTGTTTCCCTAATTGAACCAAGTTTAATAGATATAGACGTATTTTTAAATCCAGTA
|
||||
ATCCAATATGGTGTAGGAGAAATTACTTTAGATGAAATGATTCAAAAACTACACAAAAAC
|
||||
CTTTCTCTATTTCACGAATTATCAAAGGT
|
||||
Reference in New Issue
Block a user