refactor(validation): use ml_id as primary ID, add chembl_id field
This commit is contained in:
85
validation_output/README.md
Normal file
85
validation_output/README.md
Normal file
@@ -0,0 +1,85 @@
|
||||
# MacrolactoneDB Validation Output
|
||||
|
||||
This directory contains validation results for MacrolactoneDB 12-20 membered rings.
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
validation_output/
|
||||
├── README.md # This file
|
||||
├── fragments.db # SQLite database with all data
|
||||
├── summary.csv # Summary of all processed molecules
|
||||
├── summary_statistics.json # Statistical summary
|
||||
│
|
||||
├── ring_size_12/ # 12-membered rings
|
||||
├── ring_size_13/ # 13-membered rings
|
||||
...
|
||||
└── ring_size_20/ # 20-membered rings
|
||||
├── molecules.csv # Molecules in this ring size
|
||||
├── standard/ # Standard macrolactones
|
||||
│ ├── numbered/ # Numbered ring images
|
||||
│ │ └── {id}_numbered.png
|
||||
│ └── sidechains/ # Fragment images
|
||||
│ └── {id}/
|
||||
│ └── {id}_frag_{n}_pos{pos}.png
|
||||
├── non_standard/ # Non-standard macrocycles
|
||||
│ └── original/
|
||||
│ └── {id}_original.png
|
||||
└── rejected/ # Not macrolactones
|
||||
└── original/
|
||||
└── {id}_original.png
|
||||
```
|
||||
|
||||
## Database Schema
|
||||
|
||||
### Tables
|
||||
|
||||
- **parent_molecules**: Original molecule information
|
||||
- **ring_numberings**: Ring atom numbering details
|
||||
- **side_chain_fragments**: Fragmentation results with isotope tags
|
||||
- **validation_results**: Manual validation records
|
||||
|
||||
### Key Fields
|
||||
|
||||
- `classification`: standard_macrolactone | non_standard_macrocycle | not_macrolactone
|
||||
- `dummy_isotope`: Cleavage position stored as isotope value for reconstruction
|
||||
- `cleavage_position`: Position on ring where side chain was attached
|
||||
|
||||
## Ring Numbering Convention
|
||||
|
||||
1. Position 1 = Lactone carbonyl carbon (C=O)
|
||||
2. Position 2 = Ester oxygen (-O-)
|
||||
3. Positions 3-N = Sequential around ring
|
||||
|
||||
## Isotope Tagging
|
||||
|
||||
Fragments use isotope values to mark cleavage position:
|
||||
- `[5*]CCO` = Fragment from position 5, dummy atom has isotope=5
|
||||
- This enables precise reconstruction during reassembly
|
||||
|
||||
## CSV Columns
|
||||
|
||||
### summary.csv
|
||||
|
||||
- `source_id`: Original molecule ID from MacrolactoneDB
|
||||
- `classification`: Classification result
|
||||
- `ring_size`: Detected ring size (12-20)
|
||||
- `num_sidechains`: Number of side chains detected
|
||||
- `cleavage_positions`: JSON array of cleavage positions
|
||||
- `processing_status`: pending | success | failed | skipped
|
||||
|
||||
## Querying the Database
|
||||
|
||||
```bash
|
||||
# List tables
|
||||
sqlite3 fragments.db ".tables"
|
||||
|
||||
# Get standard macrolactones with fragments
|
||||
sqlite3 fragments.db "SELECT * FROM parent_molecules WHERE classification='standard_macrolactone' LIMIT 5;"
|
||||
|
||||
# Get fragments for a specific molecule
|
||||
sqlite3 fragments.db "SELECT * FROM side_chain_fragments WHERE parent_id=1;"
|
||||
|
||||
# Count by ring size
|
||||
sqlite3 fragments.db "SELECT ring_size, COUNT(*) FROM parent_molecules GROUP BY ring_size;"
|
||||
```
|
||||
BIN
validation_output/fragments.db
Normal file
BIN
validation_output/fragments.db
Normal file
Binary file not shown.
1098
validation_output/summary.csv
Normal file
1098
validation_output/summary.csv
Normal file
File diff suppressed because it is too large
Load Diff
24
validation_output/summary_statistics.json
Normal file
24
validation_output/summary_statistics.json
Normal file
@@ -0,0 +1,24 @@
|
||||
{
|
||||
"total_molecules": 1097,
|
||||
"by_classification": {
|
||||
"non_standard_macrocycle": 617,
|
||||
"standard_macrolactone": 459,
|
||||
"not_macrolactone": 21
|
||||
},
|
||||
"by_ring_size": {
|
||||
"14.0": 301,
|
||||
"16.0": 187,
|
||||
"15.0": 161,
|
||||
"12.0": 141,
|
||||
"19.0": 85,
|
||||
"18.0": 80,
|
||||
"13.0": 67,
|
||||
"20.0": 24,
|
||||
"17.0": 19
|
||||
},
|
||||
"by_status": {
|
||||
"skipped": 638,
|
||||
"success": 367,
|
||||
"failed": 92
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user