Files
macro_split/validation_output
lingyuzeng 8071a141ee feat(validation): archive key result assets
Keep key validation outputs and analysis tables tracked directly,
package analysis plot PNGs into a small tar.gz backup, and add
analysis scripts plus tests so the stored results remain
reproducible without flooding git with large image trees.
2026-03-19 21:34:27 +08:00
..

MacrolactoneDB Validation Output

This directory contains validation results for MacrolactoneDB 12-20 membered rings.

Directory Structure

validation_output/
├── README.md                    # This file
├── fragments.db                 # SQLite database with all data
├── fragment_library.csv         # Unified fragment library export
├── summary.csv                  # Summary of all processed molecules
├── summary_statistics.json      # Statistical summary
│
├── ring_size_12/                # 12-membered rings
├── ring_size_13/                # 13-membered rings
...
└── ring_size_20/                # 20-membered rings
    ├── molecules.csv            # Molecules in this ring size
    ├── standard/                # Standard macrolactones
    │   ├── numbered/            # Numbered ring images
    │   │   └── {id}_numbered.png
    │   └── sidechains/          # Fragment images
    │       └── {id}/
    │           └── {id}_frag_{n}_pos{pos}.png
    ├── non_standard/            # Non-standard macrocycles
    │   └── original/
    │       └── {id}_original.png
    └── rejected/                # Not macrolactones
        └── original/
            └── {id}_original.png

Database Schema

Tables

  • parent_molecules: Original molecule information
  • ring_numberings: Ring atom numbering details
  • side_chain_fragments: Fragmentation results with isotope tags
  • fragment_library_entries: Unified fragment library rows for downstream design
  • validation_results: Manual validation records

Key Fields

  • classification: standard_macrolactone | non_standard_macrocycle | not_macrolactone
  • dummy_isotope: Cleavage position stored as isotope value for reconstruction
  • cleavage_position: Position on ring where side chain was attached
  • has_dummy_atom: Whether the fragment contains a dummy atom for splicing
  • dummy_atom_count: Number of dummy atoms in the fragment

Ring Numbering Convention

  1. Position 1 = Lactone carbonyl carbon (C=O)
  2. Position 2 = Ester oxygen (-O-)
  3. Positions 3-N = Sequential around ring

Isotope Tagging

Fragments use isotope values to mark cleavage position:

  • [5*]CCO = Fragment from position 5, dummy atom has isotope=5
  • This enables precise reconstruction during reassembly

CSV Columns

summary.csv

  • ml_id: MacrolactoneDB unique ID (e.g., ML00000001)
  • chembl_id: Original CHEMBL ID (if available)
  • classification: Classification result
  • ring_size: Detected ring size (12-20)
  • num_sidechains: Number of side chains detected
  • cleavage_positions: JSON array of cleavage positions
  • processing_status: pending | success | failed | skipped

fragment_library.csv

  • source_type: validation_extract | supplemental (reserved)
  • has_dummy_atom: Whether the fragment contains a dummy atom
  • dummy_atom_count: Number of dummy atoms
  • splice_ready: Whether the fragment is directly compatible with single-anchor splicing

Querying the Database

# List tables
sqlite3 fragments.db ".tables"

# Get standard macrolactones with fragments
sqlite3 fragments.db "SELECT * FROM parent_molecules WHERE classification='standard_macrolactone' LIMIT 5;"

# Get fragments for a specific molecule
sqlite3 fragments.db "SELECT * FROM side_chain_fragments WHERE parent_id=1;"

# Count by ring size
sqlite3 fragments.db "SELECT ring_size, COUNT(*) FROM parent_molecules GROUP BY ring_size;"