feat: 支持绑定外部 bt_toxin 数据库 (2025-11-04 更新)

- docker_client.py: run_bttoxin_digger() 新增 bttoxin_db_dir 参数,支持挂载外部数据库
- run_single_fna_pipeline.py: 新增 --bttoxin_db_dir 参数,自动检测 external_dbs/bt_toxin
- README.md: 添加 bttoxin_db 更新说明和 Docker 绑定文档
- external_dbs/bt_toxin: 添加 2025-11-04 版本数据库文件

测试验证: HAN055 样本毒素命名版本号变化 (Cry2Aa9→22, Cry2Ab35→41, Cry1Ia40→42, Vip3Aa7→79)
This commit is contained in:
2026-01-04 14:37:49 +08:00
parent 5883e13c56
commit 1c0e8f90a5
40 changed files with 166422 additions and 194 deletions

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@@ -0,0 +1,22 @@
{
"version": "1.2",
"dbname": "bt_toxin",
"dbtype": "Protein",
"db-version": 5,
"description": "bt_toxin20251104.fas",
"number-of-letters": 996368,
"number-of-sequences": 1199,
"last-updated": "2025-11-04T15:35:00",
"number-of-volumes": 1,
"bytes-total": 1149077,
"bytes-to-cache": 1007264,
"files": [
"bt_toxin.pdb",
"bt_toxin.phr",
"bt_toxin.pin",
"bt_toxin.pot",
"bt_toxin.psq",
"bt_toxin.ptf",
"bt_toxin.pto"
]
}

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,36 @@
def get_unique_headers(file_path):
"""读取文件中以'>'开头的行,返回'>'后面内容的集合"""
headers = set()
with open(file_path, 'r') as f:
for line in f:
line = line.strip()
if line.startswith('>'):
# 提取'>'后面的内容(包括可能的空格和其他字符)
header = line[1:]
headers.add(header)
return headers
# 输入文件路径
file1 = 'bt_toxin20251104.fas'
file2 = 'all_app_cry_cyt_gpp_mcf_mpf_mpp_mtx_pra_prb_spp_tpp_txp_vip_vpa_vpb_xpp_fasta_sequences.txt'
output_file = 'unique_headers.txt'
# 获取两个文件中的header集合
headers1 = get_unique_headers(file1)
headers2 = get_unique_headers(file2)
# 计算各自独有的header
unique_to_file1 = headers1 - headers2
unique_to_file2 = headers2 - headers1
# 写入输出文件
with open(output_file, 'w') as out_f:
out_f.write(f"### Unique headers in {file1} ###\n")
for header in sorted(unique_to_file1):
out_f.write(f">{header}\n")
out_f.write(f"\n### Unique headers in {file2} ###\n")
for header in sorted(unique_to_file2):
out_f.write(f">{header}\n")
print(f"处理完成,结果已保存至 {output_file}")