Files
lingyuzeng 3e07402f4e feat(numbering): publish canonical numbering API
Add a public numbering module and route fragmenting, validation,
and scaffold preparation through the canonical numbering entry.

Rewrite the repository entry docs around the fixed numbering
contract, add MkDocs landing pages, and document the mirror
mapping used for medicinal-chemistry comparisons.

Also refresh the validation analysis reports to explain the
canonical-versus-mirrored numbering relationship.
2026-03-20 15:14:31 +08:00
..

MacrolactoneDB Validation Output

This directory contains validation results for MacrolactoneDB 12-20 membered rings.

Directory Structure

validation_output/
├── README.md                    # This file
├── fragments.db                 # SQLite database with all data
├── fragment_library.csv         # Unified fragment library export
├── summary.csv                  # Summary of all processed molecules
├── summary_statistics.json      # Statistical summary
│
├── ring_size_12/                # 12-membered rings
├── ring_size_13/                # 13-membered rings
...
└── ring_size_20/                # 20-membered rings
    ├── molecules.csv            # Molecules in this ring size
    ├── standard/                # Standard macrolactones
    │   ├── numbered/            # Numbered ring images
    │   │   └── {id}_numbered.png
    │   └── sidechains/          # Fragment images
    │       └── {id}/
    │           └── {id}_frag_{n}_pos{pos}.png
    ├── non_standard/            # Non-standard macrocycles
    │   └── original/
    │       └── {id}_original.png
    └── rejected/                # Not macrolactones
        └── original/
            └── {id}_original.png

Database Schema

Tables

  • parent_molecules: Original molecule information
  • ring_numberings: Ring atom numbering details
  • side_chain_fragments: Fragmentation results with isotope tags
  • fragment_library_entries: Unified fragment library rows for downstream design
  • validation_results: Manual validation records

Key Fields

  • classification: standard_macrolactone | non_standard_macrocycle | not_macrolactone
  • dummy_isotope: Cleavage position stored as isotope value for reconstruction
  • cleavage_position: Position on ring where side chain was attached
  • has_dummy_atom: Whether the fragment contains a dummy atom for splicing
  • dummy_atom_count: Number of dummy atoms in the fragment

Ring Numbering Convention

  1. Position 1 = Lactone carbonyl carbon (C=O)
  2. Position 2 = Ester oxygen (-O-)
  3. Positions 3-N = Sequential around ring

Isotope Tagging

Fragments use isotope values to mark cleavage position:

  • [5*]CCO = Fragment from position 5, dummy atom has isotope=5
  • This enables precise reconstruction during reassembly

CSV Columns

summary.csv

  • ml_id: MacrolactoneDB unique ID (e.g., ML00000001)
  • chembl_id: Original CHEMBL ID (if available)
  • classification: Classification result
  • ring_size: Detected ring size (12-20)
  • num_sidechains: Number of side chains detected
  • cleavage_positions: JSON array of cleavage positions
  • processing_status: pending | success | failed | skipped

fragment_library.csv

  • source_type: validation_extract | supplemental (reserved)
  • has_dummy_atom: Whether the fragment contains a dummy atom
  • dummy_atom_count: Number of dummy atoms
  • splice_ready: Whether the fragment is directly compatible with single-anchor splicing

Querying the Database

# List tables
sqlite3 fragments.db ".tables"

# Get standard macrolactones with fragments
sqlite3 fragments.db "SELECT * FROM parent_molecules WHERE classification='standard_macrolactone' LIMIT 5;"

# Get fragments for a specific molecule
sqlite3 fragments.db "SELECT * FROM side_chain_fragments WHERE parent_id=1;"

# Count by ring size
sqlite3 fragments.db "SELECT ring_size, COUNT(*) FROM parent_molecules GROUP BY ring_size;"