Track refreshed validation outputs and add a filtered fragment library export that retains only side-chain fragments with more than 3 heavy atoms.
MacrolactoneDB Validation Output
This directory contains validation results for MacrolactoneDB 12-20 membered rings.
Directory Structure
validation_output/
├── README.md # This file
├── fragments.db # SQLite database with all data
├── fragment_library.csv # Unified fragment library export
├── summary.csv # Summary of all processed molecules
├── summary_statistics.json # Statistical summary
│
├── ring_size_12/ # 12-membered rings
├── ring_size_13/ # 13-membered rings
...
└── ring_size_20/ # 20-membered rings
├── molecules.csv # Molecules in this ring size
├── standard/ # Standard macrolactones
│ ├── numbered/ # Numbered ring images
│ │ └── {id}_numbered.png
│ └── sidechains/ # Fragment images
│ └── {id}/
│ └── {id}_frag_{n}_pos{pos}.png
├── non_standard/ # Non-standard macrocycles
│ └── original/
│ └── {id}_original.png
└── rejected/ # Not macrolactones
└── original/
└── {id}_original.png
Database Schema
Tables
- parent_molecules: Original molecule information
- ring_numberings: Ring atom numbering details
- side_chain_fragments: Fragmentation results with isotope tags
- fragment_library_entries: Unified fragment library rows for downstream design
- validation_results: Manual validation records
Key Fields
classification: standard_macrolactone | non_standard_macrocycle | not_macrolactonedummy_isotope: Cleavage position stored as isotope value for reconstructioncleavage_position: Position on ring where side chain was attachedhas_dummy_atom: Whether the fragment contains a dummy atom for splicingdummy_atom_count: Number of dummy atoms in the fragment
Ring Numbering Convention
- Position 1 = Lactone carbonyl carbon (C=O)
- Position 2 = Ester oxygen (-O-)
- Positions 3-N = Sequential around ring
Isotope Tagging
Fragments use isotope values to mark cleavage position:
[5*]CCO= Fragment from position 5, dummy atom has isotope=5- This enables precise reconstruction during reassembly
CSV Columns
summary.csv
ml_id: MacrolactoneDB unique ID (e.g., ML00000001)chembl_id: Original CHEMBL ID (if available)classification: Classification resultring_size: Detected ring size (12-20)num_sidechains: Number of side chains detectedcleavage_positions: JSON array of cleavage positionsprocessing_status: pending | success | failed | skipped
fragment_library.csv
source_type: validation_extract | supplemental (reserved)has_dummy_atom: Whether the fragment contains a dummy atomdummy_atom_count: Number of dummy atomssplice_ready: Whether the fragment is directly compatible with single-anchor splicing
Querying the Database
# List tables
sqlite3 fragments.db ".tables"
# Get standard macrolactones with fragments
sqlite3 fragments.db "SELECT * FROM parent_molecules WHERE classification='standard_macrolactone' LIMIT 5;"
# Get fragments for a specific molecule
sqlite3 fragments.db "SELECT * FROM side_chain_fragments WHERE parent_id=1;"
# Count by ring size
sqlite3 fragments.db "SELECT ring_size, COUNT(*) FROM parent_molecules GROUP BY ring_size;"