chore: 初始版本提交 - 简化架构 + 轮询改造
- 移除 Motia Streams 实时通信,改用 3 秒轮询 - 简化前端代码,移除冗余组件 - 简化后端架构,准备 FastAPI 重构 - 更新 pixi.toml 环境配置 - 保留 bttoxin_digger_v5_repro 作为参考文档 Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
41
tests/test_data/test2/README.md
Normal file
41
tests/test_data/test2/README.md
Normal file
@@ -0,0 +1,41 @@
|
||||
# NCBI Datasets
|
||||
|
||||
https://www.ncbi.nlm.nih.gov/datasets
|
||||
|
||||
This zip archive contains an NCBI Datasets Data Package.
|
||||
|
||||
NCBI Datasets Data Packages can include sequence, annotation and other data files, and metadata in one or more data report files.
|
||||
Data report files are in JSON Lines format.
|
||||
|
||||
---
|
||||
## FAQs
|
||||
### Where is the data I requested?
|
||||
|
||||
Your data is in the subdirectory `ncbi_dataset/data/` contained within this zip archive.
|
||||
|
||||
### I still can't find my data, can you help?
|
||||
|
||||
We have identified a bug affecting Mac Safari users. When downloading data from the NCBI Datasets web interface, you may see only this README file after the download has completed (while other files appear to be missing).
|
||||
As a workaround to prevent this issue from recurring, we recommend disabling automatic zip archive extraction in Safari until Apple releases a bug fix.
|
||||
For more information, visit:
|
||||
https://www.ncbi.nlm.nih.gov/datasets/docs/reference-docs/mac-zip-bug/
|
||||
|
||||
### How do I work with JSON Lines data reports?
|
||||
|
||||
Visit our JSON Lines data report documentation page:
|
||||
https://www.ncbi.nlm.nih.gov/datasets/docs/v2/tutorials/working-with-jsonl-data-reports/
|
||||
|
||||
### What is NCBI Datasets?
|
||||
|
||||
NCBI Datasets is a resource that lets you easily gather data from across NCBI databases. Find and download gene, transcript, protein and genome sequences, annotation and metadata.
|
||||
|
||||
### Where can I find NCBI Datasets documentation?
|
||||
|
||||
Visit the NCBI Datasets documentation pages:
|
||||
https://www.ncbi.nlm.nih.gov/datasets/docs/
|
||||
|
||||
---
|
||||
|
||||
National Center for Biotechnology Information
|
||||
National Library of Medicine
|
||||
info@ncbi.nlm.nih.gov
|
||||
5
tests/test_data/test2/md5sum.txt
Normal file
5
tests/test_data/test2/md5sum.txt
Normal file
@@ -0,0 +1,5 @@
|
||||
d74a9d8ceff56ab43a183a2370101b30 ncbi_dataset/data/data_summary.tsv
|
||||
eb89ce048e32b4d5c75c9b2fdba4cf18 ncbi_dataset/data/assembly_data_report.jsonl
|
||||
fe1a9664ce4cf475c63efd1a4a115994 ncbi_dataset/data/GCA_000338755.1/GCA_000338755.1_ASM33875v1_genomic.fna
|
||||
51aba9aa78d569f95f02cbb161fb1983 ncbi_dataset/data/GCF_000338755.1/GCF_000338755.1_ASM33875v1_genomic.fna
|
||||
a94dafb49fe9a39682681fbeca459c47 ncbi_dataset/data/dataset_catalog.json
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,2 @@
|
||||
{"assemblyInfo":{"assemblyLevel":"Complete Genome","assemblyName":"ASM33875v1","assemblyType":"haploid","submitter":"Institute of Plant Protection, Chinese Academy of Agricultural Sciences","bioprojectLineage":[{"bioprojects":[{"accession":"PRJNA185468","title":"Bacillus thuringiensis serovar kurstaki str. HD73 Genome sequencing"}]}],"sequencingTech":"Sanger dideoxy sequencing; 454; Illumina","biosample":{"accession":"SAMN02603412","lastUpdated":"2017-12-19T20:54:18.153","publicationDate":"2014-01-30T14:21:27.853","submissionDate":"2014-01-30T14:21:27.853","sampleIds":[{"label":"Sample name","value":"CP004069"},{"db":"SRA","value":"SRS2771020"}],"description":{"title":"Sample from Bacillus thuringiensis serovar kurstaki str. HD73","organism":{"taxId":1279365,"organismName":"Bacillus thuringiensis serovar kurstaki str. HD73"}},"owner":{"name":"NCBI"},"models":["Generic"],"bioprojects":[{"accession":"PRJNA185468"}],"package":"Generic.1.0","attributes":[{"name":"strain","value":"HD73"},{"name":"serovar","value":"kurstaki"}],"status":{"status":"live","when":"2014-01-30T14:21:27.853"},"serovar":"kurstaki","strain":"HD73"},"comments":"The bacteria is available from Fuping Song (fpsong@ippcaas.cn)","assemblyStatus":"current","pairedAssembly":{"accession":"GCF_000338755.1","status":"current","annotationName":"GCF_000338755.1-RS_2025_04_14"},"bioprojectAccession":"PRJNA185468","assemblyMethod":"newbler v. 2.3","releaseDate":"2013-02-08"},"assemblyStats":{"totalNumberOfChromosomes":8,"totalSequenceLength":"5908575","totalUngappedLength":"5908575","numberOfContigs":8,"contigN50":5646799,"contigL50":1,"numberOfScaffolds":8,"scaffoldN50":5646799,"scaffoldL50":1,"numberOfComponentSequences":8,"gcCount":"2077976","gcPercent":35,"atgcCount":"5908575"},"annotationInfo":{"name":"Annotation submitted by Institute of Plant Protection, Chinese Academy of Agricultural Sciences","provider":"Institute of Plant Protection, Chinese Academy of Agricultural Sciences","releaseDate":"2017-08-22","stats":{"geneCounts":{"total":6334,"proteinCoding":6194,"nonCoding":140}}},"currentAccession":"GCA_000338755.1","checkmInfo":{"checkmMarkerSet":"Bacillus thuringiensis","checkmMarkerSetRank":"species","checkmVersion":"v1.2.3","completeness":96.36,"contamination":0.92,"completenessPercentile":12.3817,"checkmSpeciesTaxId":1428},"averageNucleotideIdentity":{"taxonomyCheckStatus":"OK","matchStatus":"synonym_match","submittedOrganism":"Bacillus thuringiensis serovar kurstaki str. HD73","submittedSpecies":"Bacillus thuringiensis","category":"category_na","submittedAniMatch":{"assembly":"GCA_002243685.1","organismName":"Bacillus thuringiensis","category":"suspected_type","ani":96.35,"assemblyCoverage":83.17,"typeAssemblyCoverage":76.23},"bestAniMatch":{"assembly":"GCA_046524075.1","organismName":"Bacillus cereus","category":"type","ani":97.19,"assemblyCoverage":79.94,"typeAssemblyCoverage":86.65},"comment":"na"},"accession":"GCA_000338755.1","pairedAccession":"GCF_000338755.1","sourceDatabase":"SOURCE_DATABASE_GENBANK","organism":{"taxId":1279365,"organismName":"Bacillus thuringiensis serovar kurstaki str. HD73","infraspecificNames":{"strain":"HD73"}}}
|
||||
{"assemblyInfo":{"assemblyLevel":"Complete Genome","assemblyName":"ASM33875v1","assemblyType":"haploid","submitter":"Institute of Plant Protection, Chinese Academy of Agricultural Sciences","bioprojectLineage":[{"bioprojects":[{"accession":"PRJNA185468","title":"Bacillus thuringiensis serovar kurstaki str. HD73 Genome sequencing"}]}],"sequencingTech":"Sanger dideoxy sequencing; 454; Illumina","biosample":{"accession":"SAMN02603412","lastUpdated":"2017-12-19T20:54:18.153","publicationDate":"2014-01-30T14:21:27.853","submissionDate":"2014-01-30T14:21:27.853","sampleIds":[{"label":"Sample name","value":"CP004069"},{"db":"SRA","value":"SRS2771020"}],"description":{"title":"Sample from Bacillus thuringiensis serovar kurstaki str. HD73","organism":{"taxId":1279365,"organismName":"Bacillus thuringiensis serovar kurstaki str. HD73"}},"owner":{"name":"NCBI"},"models":["Generic"],"bioprojects":[{"accession":"PRJNA185468"}],"package":"Generic.1.0","attributes":[{"name":"strain","value":"HD73"},{"name":"serovar","value":"kurstaki"}],"status":{"status":"live","when":"2014-01-30T14:21:27.853"},"serovar":"kurstaki","strain":"HD73"},"comments":"The annotation was added by the NCBI Prokaryotic Genome Annotation Pipeline (PGAP). Information about PGAP can be found here: https://www.ncbi.nlm.nih.gov/genome/annotation_prok/\nThe bacteria is available from Fuping Song (fpsong@ippcaas.cn)","assemblyStatus":"current","pairedAssembly":{"accession":"GCA_000338755.1","status":"current","annotationName":"Annotation submitted by Institute of Plant Protection, Chinese Academy of Agricultural Sciences"},"bioprojectAccession":"PRJNA185468","assemblyMethod":"newbler v. 2.3","releaseDate":"2013-02-08"},"assemblyStats":{"totalNumberOfChromosomes":8,"totalSequenceLength":"5908575","totalUngappedLength":"5908575","numberOfContigs":8,"contigN50":5646799,"contigL50":1,"numberOfScaffolds":8,"scaffoldN50":5646799,"scaffoldL50":1,"numberOfComponentSequences":8,"gcCount":"2077976","gcPercent":35,"atgcCount":"5908575"},"annotationInfo":{"name":"GCF_000338755.1-RS_2025_04_14","provider":"NCBI RefSeq","releaseDate":"2025-04-14","stats":{"geneCounts":{"total":6168,"proteinCoding":5679,"nonCoding":145,"pseudogene":344}},"method":"Best-placed reference protein set; GeneMarkS-2+","pipeline":"NCBI Prokaryotic Genome Annotation Pipeline (PGAP)","softwareVersion":"6.10"},"currentAccession":"GCF_000338755.1","checkmInfo":{"checkmMarkerSet":"Bacillus thuringiensis","checkmMarkerSetRank":"species","checkmVersion":"v1.2.3","completeness":96.36,"contamination":0.92,"completenessPercentile":12.3817,"checkmSpeciesTaxId":1428},"averageNucleotideIdentity":{"taxonomyCheckStatus":"OK","matchStatus":"synonym_match","submittedOrganism":"Bacillus thuringiensis serovar kurstaki str. HD73","submittedSpecies":"Bacillus thuringiensis","category":"category_na","submittedAniMatch":{"assembly":"GCA_002243685.1","organismName":"Bacillus thuringiensis","category":"suspected_type","ani":96.35,"assemblyCoverage":83.17,"typeAssemblyCoverage":76.23},"bestAniMatch":{"assembly":"GCA_046524075.1","organismName":"Bacillus cereus","category":"type","ani":97.19,"assemblyCoverage":79.94,"typeAssemblyCoverage":86.65},"comment":"na"},"accession":"GCF_000338755.1","pairedAccession":"GCA_000338755.1","sourceDatabase":"SOURCE_DATABASE_REFSEQ","organism":{"taxId":1279365,"organismName":"Bacillus thuringiensis serovar kurstaki str. HD73","infraspecificNames":{"strain":"HD73"}}}
|
||||
3
tests/test_data/test2/ncbi_dataset/data/data_summary.tsv
Normal file
3
tests/test_data/test2/ncbi_dataset/data/data_summary.tsv
Normal file
@@ -0,0 +1,3 @@
|
||||
Organism Scientific Name Organism Common Name Organism Qualifier Taxonomy id Assembly Name Assembly Accession Source Annotation Level Contig N50 Size Submission Date Gene Count BioProject BioSample
|
||||
Bacillus thuringiensis serovar kurstaki str. HD73 strain: HD73 1279365 ASM33875v1 GCA_000338755.1 GenBank Annotation submitted by Institute of Plant Protection, Chinese Academy of Agricultural Sciences Complete Genome 5646799 5908575 2013-02-08 6334 PRJNA185468 SAMN02603412
|
||||
Bacillus thuringiensis serovar kurstaki str. HD73 strain: HD73 1279365 ASM33875v1 GCF_000338755.1 RefSeq GCF_000338755.1-RS_2025_04_14 Complete Genome 5646799 5908575 2013-02-08 6168 PRJNA185468 SAMN02603412
|
||||
|
35
tests/test_data/test2/ncbi_dataset/data/dataset_catalog.json
Normal file
35
tests/test_data/test2/ncbi_dataset/data/dataset_catalog.json
Normal file
@@ -0,0 +1,35 @@
|
||||
{
|
||||
"apiVersion": "V2",
|
||||
"assemblies": [
|
||||
{
|
||||
"files": [
|
||||
{
|
||||
"filePath": "data_summary.tsv",
|
||||
"fileType": "DATA_TABLE",
|
||||
"uncompressedLengthBytes": "686"
|
||||
},
|
||||
{
|
||||
"filePath": "assembly_data_report.jsonl",
|
||||
"fileType": "DATA_REPORT",
|
||||
"uncompressedLengthBytes": "6616"
|
||||
}
|
||||
]
|
||||
},{
|
||||
"accession": "GCA_000338755.1",
|
||||
"files": [
|
||||
{
|
||||
"filePath": "GCA_000338755.1/GCA_000338755.1_ASM33875v1_genomic.fna",
|
||||
"fileType": "GENOMIC_NUCLEOTIDE_FASTA",
|
||||
"uncompressedLengthBytes": "5983182"
|
||||
}
|
||||
]
|
||||
},{
|
||||
"accession": "GCF_000338755.1",
|
||||
"files": [
|
||||
{
|
||||
"filePath": "GCF_000338755.1/GCF_000338755.1_ASM33875v1_genomic.fna",
|
||||
"fileType": "GENOMIC_NUCLEOTIDE_FASTA",
|
||||
"uncompressedLengthBytes": "5983192"
|
||||
}
|
||||
]
|
||||
}]}
|
||||
BIN
tests/test_data/test2/ncbi_dataset2.zip
Normal file
BIN
tests/test_data/test2/ncbi_dataset2.zip
Normal file
Binary file not shown.
Reference in New Issue
Block a user