chore(validation-output): add gt3 fragment export
Track refreshed validation outputs and add a filtered fragment library export that retains only side-chain fragments with more than 3 heavy atoms.
This commit is contained in:
@@ -8,6 +8,7 @@ This directory contains validation results for MacrolactoneDB 12-20 membered rin
|
||||
validation_output/
|
||||
├── README.md # This file
|
||||
├── fragments.db # SQLite database with all data
|
||||
├── fragment_library.csv # Unified fragment library export
|
||||
├── summary.csv # Summary of all processed molecules
|
||||
├── summary_statistics.json # Statistical summary
|
||||
│
|
||||
@@ -37,6 +38,7 @@ validation_output/
|
||||
- **parent_molecules**: Original molecule information
|
||||
- **ring_numberings**: Ring atom numbering details
|
||||
- **side_chain_fragments**: Fragmentation results with isotope tags
|
||||
- **fragment_library_entries**: Unified fragment library rows for downstream design
|
||||
- **validation_results**: Manual validation records
|
||||
|
||||
### Key Fields
|
||||
@@ -44,6 +46,8 @@ validation_output/
|
||||
- `classification`: standard_macrolactone | non_standard_macrocycle | not_macrolactone
|
||||
- `dummy_isotope`: Cleavage position stored as isotope value for reconstruction
|
||||
- `cleavage_position`: Position on ring where side chain was attached
|
||||
- `has_dummy_atom`: Whether the fragment contains a dummy atom for splicing
|
||||
- `dummy_atom_count`: Number of dummy atoms in the fragment
|
||||
|
||||
## Ring Numbering Convention
|
||||
|
||||
@@ -61,13 +65,21 @@ Fragments use isotope values to mark cleavage position:
|
||||
|
||||
### summary.csv
|
||||
|
||||
- `source_id`: Original molecule ID from MacrolactoneDB
|
||||
- `ml_id`: MacrolactoneDB unique ID (e.g., ML00000001)
|
||||
- `chembl_id`: Original CHEMBL ID (if available)
|
||||
- `classification`: Classification result
|
||||
- `ring_size`: Detected ring size (12-20)
|
||||
- `num_sidechains`: Number of side chains detected
|
||||
- `cleavage_positions`: JSON array of cleavage positions
|
||||
- `processing_status`: pending | success | failed | skipped
|
||||
|
||||
### fragment_library.csv
|
||||
|
||||
- `source_type`: validation_extract | supplemental (reserved)
|
||||
- `has_dummy_atom`: Whether the fragment contains a dummy atom
|
||||
- `dummy_atom_count`: Number of dummy atoms
|
||||
- `splice_ready`: Whether the fragment is directly compatible with single-anchor splicing
|
||||
|
||||
## Querying the Database
|
||||
|
||||
```bash
|
||||
|
||||
6280
validation_output/fragment_library_filter_gt3.csv
Normal file
6280
validation_output/fragment_library_filter_gt3.csv
Normal file
File diff suppressed because it is too large
Load Diff
Binary file not shown.
File diff suppressed because it is too large
Load Diff
@@ -1,24 +1,23 @@
|
||||
{
|
||||
"total_molecules": 1097,
|
||||
"total_molecules": 11036,
|
||||
"by_classification": {
|
||||
"non_standard_macrocycle": 617,
|
||||
"standard_macrolactone": 459,
|
||||
"not_macrolactone": 21
|
||||
"non_standard_macrocycle": 6336,
|
||||
"standard_macrolactone": 4482,
|
||||
"not_macrolactone": 218
|
||||
},
|
||||
"by_ring_size": {
|
||||
"14.0": 301,
|
||||
"16.0": 187,
|
||||
"15.0": 161,
|
||||
"12.0": 141,
|
||||
"19.0": 85,
|
||||
"18.0": 80,
|
||||
"13.0": 67,
|
||||
"20.0": 24,
|
||||
"17.0": 19
|
||||
"14.0": 3017,
|
||||
"16.0": 1879,
|
||||
"15.0": 1613,
|
||||
"12.0": 1419,
|
||||
"19.0": 855,
|
||||
"18.0": 809,
|
||||
"13.0": 679,
|
||||
"20.0": 243,
|
||||
"17.0": 196
|
||||
},
|
||||
"by_status": {
|
||||
"skipped": 638,
|
||||
"success": 367,
|
||||
"failed": 92
|
||||
"skipped": 6554,
|
||||
"success": 4482
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user