chore(validation-output): add gt3 fragment export

Track refreshed validation outputs and add a filtered fragment
library export that retains only side-chain fragments with more
than 3 heavy atoms.
This commit is contained in:
2026-03-19 21:20:56 +08:00
parent 46a438dd36
commit f6bf9e85a3
5 changed files with 17345 additions and 1115 deletions

View File

@@ -8,6 +8,7 @@ This directory contains validation results for MacrolactoneDB 12-20 membered rin
validation_output/ validation_output/
├── README.md # This file ├── README.md # This file
├── fragments.db # SQLite database with all data ├── fragments.db # SQLite database with all data
├── fragment_library.csv # Unified fragment library export
├── summary.csv # Summary of all processed molecules ├── summary.csv # Summary of all processed molecules
├── summary_statistics.json # Statistical summary ├── summary_statistics.json # Statistical summary
@@ -37,6 +38,7 @@ validation_output/
- **parent_molecules**: Original molecule information - **parent_molecules**: Original molecule information
- **ring_numberings**: Ring atom numbering details - **ring_numberings**: Ring atom numbering details
- **side_chain_fragments**: Fragmentation results with isotope tags - **side_chain_fragments**: Fragmentation results with isotope tags
- **fragment_library_entries**: Unified fragment library rows for downstream design
- **validation_results**: Manual validation records - **validation_results**: Manual validation records
### Key Fields ### Key Fields
@@ -44,6 +46,8 @@ validation_output/
- `classification`: standard_macrolactone | non_standard_macrocycle | not_macrolactone - `classification`: standard_macrolactone | non_standard_macrocycle | not_macrolactone
- `dummy_isotope`: Cleavage position stored as isotope value for reconstruction - `dummy_isotope`: Cleavage position stored as isotope value for reconstruction
- `cleavage_position`: Position on ring where side chain was attached - `cleavage_position`: Position on ring where side chain was attached
- `has_dummy_atom`: Whether the fragment contains a dummy atom for splicing
- `dummy_atom_count`: Number of dummy atoms in the fragment
## Ring Numbering Convention ## Ring Numbering Convention
@@ -61,13 +65,21 @@ Fragments use isotope values to mark cleavage position:
### summary.csv ### summary.csv
- `source_id`: Original molecule ID from MacrolactoneDB - `ml_id`: MacrolactoneDB unique ID (e.g., ML00000001)
- `chembl_id`: Original CHEMBL ID (if available)
- `classification`: Classification result - `classification`: Classification result
- `ring_size`: Detected ring size (12-20) - `ring_size`: Detected ring size (12-20)
- `num_sidechains`: Number of side chains detected - `num_sidechains`: Number of side chains detected
- `cleavage_positions`: JSON array of cleavage positions - `cleavage_positions`: JSON array of cleavage positions
- `processing_status`: pending | success | failed | skipped - `processing_status`: pending | success | failed | skipped
### fragment_library.csv
- `source_type`: validation_extract | supplemental (reserved)
- `has_dummy_atom`: Whether the fragment contains a dummy atom
- `dummy_atom_count`: Number of dummy atoms
- `splice_ready`: Whether the fragment is directly compatible with single-anchor splicing
## Querying the Database ## Querying the Database
```bash ```bash

File diff suppressed because it is too large Load Diff

Binary file not shown.

File diff suppressed because it is too large Load Diff

View File

@@ -1,24 +1,23 @@
{ {
"total_molecules": 1097, "total_molecules": 11036,
"by_classification": { "by_classification": {
"non_standard_macrocycle": 617, "non_standard_macrocycle": 6336,
"standard_macrolactone": 459, "standard_macrolactone": 4482,
"not_macrolactone": 21 "not_macrolactone": 218
}, },
"by_ring_size": { "by_ring_size": {
"14.0": 301, "14.0": 3017,
"16.0": 187, "16.0": 1879,
"15.0": 161, "15.0": 1613,
"12.0": 141, "12.0": 1419,
"19.0": 85, "19.0": 855,
"18.0": 80, "18.0": 809,
"13.0": 67, "13.0": 679,
"20.0": 24, "20.0": 243,
"17.0": 19 "17.0": 196
}, },
"by_status": { "by_status": {
"skipped": 638, "skipped": 6554,
"success": 367, "success": 4482
"failed": 92
} }
} }