Keep key validation outputs and analysis tables tracked directly, package analysis plot PNGs into a small tar.gz backup, and add analysis scripts plus tests so the stored results remain reproducible without flooding git with large image trees.
MacrolactoneDB Validation Output
This directory contains validation results for MacrolactoneDB 12-20 membered rings.
Directory Structure
validation_output/
├── README.md # This file
├── fragments.db # SQLite database with all data
├── fragment_library.csv # Unified fragment library export
├── summary.csv # Summary of all processed molecules
├── summary_statistics.json # Statistical summary
│
├── ring_size_12/ # 12-membered rings
├── ring_size_13/ # 13-membered rings
...
└── ring_size_20/ # 20-membered rings
├── molecules.csv # Molecules in this ring size
├── standard/ # Standard macrolactones
│ ├── numbered/ # Numbered ring images
│ │ └── {id}_numbered.png
│ └── sidechains/ # Fragment images
│ └── {id}/
│ └── {id}_frag_{n}_pos{pos}.png
├── non_standard/ # Non-standard macrocycles
│ └── original/
│ └── {id}_original.png
└── rejected/ # Not macrolactones
└── original/
└── {id}_original.png
Database Schema
Tables
- parent_molecules: Original molecule information
- ring_numberings: Ring atom numbering details
- side_chain_fragments: Fragmentation results with isotope tags
- fragment_library_entries: Unified fragment library rows for downstream design
- validation_results: Manual validation records
Key Fields
classification: standard_macrolactone | non_standard_macrocycle | not_macrolactonedummy_isotope: Cleavage position stored as isotope value for reconstructioncleavage_position: Position on ring where side chain was attachedhas_dummy_atom: Whether the fragment contains a dummy atom for splicingdummy_atom_count: Number of dummy atoms in the fragment
Ring Numbering Convention
- Position 1 = Lactone carbonyl carbon (C=O)
- Position 2 = Ester oxygen (-O-)
- Positions 3-N = Sequential around ring
Isotope Tagging
Fragments use isotope values to mark cleavage position:
[5*]CCO= Fragment from position 5, dummy atom has isotope=5- This enables precise reconstruction during reassembly
CSV Columns
summary.csv
ml_id: MacrolactoneDB unique ID (e.g., ML00000001)chembl_id: Original CHEMBL ID (if available)classification: Classification resultring_size: Detected ring size (12-20)num_sidechains: Number of side chains detectedcleavage_positions: JSON array of cleavage positionsprocessing_status: pending | success | failed | skipped
fragment_library.csv
source_type: validation_extract | supplemental (reserved)has_dummy_atom: Whether the fragment contains a dummy atomdummy_atom_count: Number of dummy atomssplice_ready: Whether the fragment is directly compatible with single-anchor splicing
Querying the Database
# List tables
sqlite3 fragments.db ".tables"
# Get standard macrolactones with fragments
sqlite3 fragments.db "SELECT * FROM parent_molecules WHERE classification='standard_macrolactone' LIMIT 5;"
# Get fragments for a specific molecule
sqlite3 fragments.db "SELECT * FROM side_chain_fragments WHERE parent_id=1;"
# Count by ring size
sqlite3 fragments.db "SELECT ring_size, COUNT(*) FROM parent_molecules GROUP BY ring_size;"