Files
macro_split/validation_output

MacrolactoneDB Validation Output

This directory contains validation results for MacrolactoneDB 12-20 membered rings.

Directory Structure

validation_output/
├── README.md                    # This file
├── fragments.db                 # SQLite database with all data
├── summary.csv                  # Summary of all processed molecules
├── summary_statistics.json      # Statistical summary
│
├── ring_size_12/                # 12-membered rings
├── ring_size_13/                # 13-membered rings
...
└── ring_size_20/                # 20-membered rings
    ├── molecules.csv            # Molecules in this ring size
    ├── standard/                # Standard macrolactones
    │   ├── numbered/            # Numbered ring images
    │   │   └── {id}_numbered.png
    │   └── sidechains/          # Fragment images
    │       └── {id}/
    │           └── {id}_frag_{n}_pos{pos}.png
    ├── non_standard/            # Non-standard macrocycles
    │   └── original/
    │       └── {id}_original.png
    └── rejected/                # Not macrolactones
        └── original/
            └── {id}_original.png

Database Schema

Tables

  • parent_molecules: Original molecule information
  • ring_numberings: Ring atom numbering details
  • side_chain_fragments: Fragmentation results with isotope tags
  • validation_results: Manual validation records

Key Fields

  • classification: standard_macrolactone | non_standard_macrocycle | not_macrolactone
  • dummy_isotope: Cleavage position stored as isotope value for reconstruction
  • cleavage_position: Position on ring where side chain was attached

Ring Numbering Convention

  1. Position 1 = Lactone carbonyl carbon (C=O)
  2. Position 2 = Ester oxygen (-O-)
  3. Positions 3-N = Sequential around ring

Isotope Tagging

Fragments use isotope values to mark cleavage position:

  • [5*]CCO = Fragment from position 5, dummy atom has isotope=5
  • This enables precise reconstruction during reassembly

CSV Columns

summary.csv

  • source_id: Original molecule ID from MacrolactoneDB
  • classification: Classification result
  • ring_size: Detected ring size (12-20)
  • num_sidechains: Number of side chains detected
  • cleavage_positions: JSON array of cleavage positions
  • processing_status: pending | success | failed | skipped

Querying the Database

# List tables
sqlite3 fragments.db ".tables"

# Get standard macrolactones with fragments
sqlite3 fragments.db "SELECT * FROM parent_molecules WHERE classification='standard_macrolactone' LIMIT 5;"

# Get fragments for a specific molecule
sqlite3 fragments.db "SELECT * FROM side_chain_fragments WHERE parent_id=1;"

# Count by ring size
sqlite3 fragments.db "SELECT ring_size, COUNT(*) FROM parent_molecules GROUP BY ring_size;"