# MacrolactoneDB Validation Output This directory contains validation results for MacrolactoneDB 12-20 membered rings. ## Directory Structure ``` validation_output/ ├── README.md # This file ├── fragments.db # SQLite database with all data ├── summary.csv # Summary of all processed molecules ├── summary_statistics.json # Statistical summary │ ├── ring_size_12/ # 12-membered rings ├── ring_size_13/ # 13-membered rings ... └── ring_size_20/ # 20-membered rings ├── molecules.csv # Molecules in this ring size ├── standard/ # Standard macrolactones │ ├── numbered/ # Numbered ring images │ │ └── {id}_numbered.png │ └── sidechains/ # Fragment images │ └── {id}/ │ └── {id}_frag_{n}_pos{pos}.png ├── non_standard/ # Non-standard macrocycles │ └── original/ │ └── {id}_original.png └── rejected/ # Not macrolactones └── original/ └── {id}_original.png ``` ## Database Schema ### Tables - **parent_molecules**: Original molecule information - **ring_numberings**: Ring atom numbering details - **side_chain_fragments**: Fragmentation results with isotope tags - **validation_results**: Manual validation records ### Key Fields - `classification`: standard_macrolactone | non_standard_macrocycle | not_macrolactone - `dummy_isotope`: Cleavage position stored as isotope value for reconstruction - `cleavage_position`: Position on ring where side chain was attached ## Ring Numbering Convention 1. Position 1 = Lactone carbonyl carbon (C=O) 2. Position 2 = Ester oxygen (-O-) 3. Positions 3-N = Sequential around ring ## Isotope Tagging Fragments use isotope values to mark cleavage position: - `[5*]CCO` = Fragment from position 5, dummy atom has isotope=5 - This enables precise reconstruction during reassembly ## CSV Columns ### summary.csv - `source_id`: Original molecule ID from MacrolactoneDB - `classification`: Classification result - `ring_size`: Detected ring size (12-20) - `num_sidechains`: Number of side chains detected - `cleavage_positions`: JSON array of cleavage positions - `processing_status`: pending | success | failed | skipped ## Querying the Database ```bash # List tables sqlite3 fragments.db ".tables" # Get standard macrolactones with fragments sqlite3 fragments.db "SELECT * FROM parent_molecules WHERE classification='standard_macrolactone' LIMIT 5;" # Get fragments for a specific molecule sqlite3 fragments.db "SELECT * FROM side_chain_fragments WHERE parent_id=1;" # Count by ring size sqlite3 fragments.db "SELECT ring_size, COUNT(*) FROM parent_molecules GROUP BY ring_size;" ```