# Standard vs Non-Standard Macrocycle Classification ## Summary Add a formal molecule-level classification layer on top of the current `macro_lactone_toolkit` detection logic so the toolkit can distinguish: - `standard_macrolactone` - `non_standard_macrocycle` - `not_macrolactone` This classification must support the two new rejection rules: 1. After ring numbering is assigned, positions `3..N` must all be carbon atoms. If any atom at positions `3..N` is not carbon, classify the molecule as `non_standard_macrocycle`. 2. If multiple candidate macrolactone rings overlap in the same atom set graph, classify the molecule as `non_standard_macrocycle`. Use only overlapping candidate rings for this rule; disconnected or non-overlapping candidates do not trigger this specific rejection. Do not rely on a “largest ring” assumption. Base detection on RDKit ring candidates from `RingInfo.AtomRings()` plus explicit lactone validation, then apply the new standard/non-standard filters. ## Public API And Output Changes Add a new result type, e.g. `MacrocycleClassificationResult`, with these fields: - `smiles: str` - `classification: Literal["standard_macrolactone", "non_standard_macrocycle", "not_macrolactone"]` - `ring_size: int | None` - `primary_reason_code: str | None` - `primary_reason_message: str | None` - `all_reason_codes: list[str]` - `all_reason_messages: list[str]` - `candidate_ring_sizes: list[int]` Add a new public API on `MacroLactoneAnalyzer`: - `classify_macrocycle(mol_input: str | Chem.Mol, ring_size: int | None = None) -> MacrocycleClassificationResult` Behavior: - If `ring_size` is omitted, inspect all 12-20 membered lactone candidates. - If `ring_size` is provided, restrict candidate selection to that size before classification. - Invalid SMILES should keep raising the existing detection exception path; do not encode invalid input as a classification result. - For `standard_macrolactone`, `ring_size` must be the accepted ring size and all reason fields must be empty. - For `non_standard_macrocycle`, `ring_size` should be the candidate ring size if exactly one size remains relevant, otherwise `None`. - For `not_macrolactone`, return no ring size and a reason describing why no valid 12-20 lactone candidate survived. Reason codes must be decision-complete and fixed: - `contains_non_carbon_ring_atoms_outside_positions_1_2` - `multiple_overlapping_macrocycle_candidates` - `no_lactone_ring_in_12_to_20_range` - `requested_ring_size_not_found` Reason messages must be short English sentences: - `Ring positions 3..N contain non-carbon atoms.` - `Overlapping macrolactone candidate rings were detected.` - `No 12-20 membered lactone ring was detected.` - `The requested ring size was not detected as a lactone ring.` Update CLI `macro-lactone-toolkit analyze` to return this classification result shape for single-SMILES mode and row-wise CSV mode. Do not add a new CLI subcommand. Keep `analyze` as the classification surface. ## Implementation Changes ### Detection And Candidate Grouping In the current core detection module: - Keep the existing lactone-ring candidate search based on `RingInfo.AtomRings()` and lactone atom validation. - Add an overlap-group pass over candidate rings: - Build a graph where two candidates are connected if their ring atom sets intersect. - Compute connected components on this graph. - If any connected component contains more than one candidate, classify as `non_standard_macrocycle` with `multiple_overlapping_macrocycle_candidates`. - Do not treat disconnected candidate rings as overlapping. - Keep `candidate_ring_sizes` as the sorted unique sizes from the filtered candidate list. ### Standard Macrocycle Filter For any single candidate that survives overlap rejection: - Build numbering exactly as today: position 1 is the lactone carbonyl carbon, position 2 is the ring ester oxygen. - Inspect positions `3..N`. - Every atom at positions `3..N` must have atomic number 6. - If any position `3..N` is not carbon, classify as `non_standard_macrocycle` with `contains_non_carbon_ring_atoms_outside_positions_1_2`. This rule must reject ring peptides and other heteroatom-containing macrocycles even if they contain a lactone bond. ### Fragmenter Integration Update `MacrolactoneFragmenter` so that: - `number_molecule()` and `fragment_molecule()` first call `classify_macrocycle()`. - They only proceed when classification is `standard_macrolactone`. - For `non_standard_macrocycle` or `not_macrolactone`, raise the existing detection exception type with a message that includes the classification and the primary reason code. - Do not change fragmentation output semantics for standard macrolactones. ### Files To Change Concentrate changes in: - `src/macro_lactone_toolkit/_core.py` - `src/macro_lactone_toolkit/analyzer.py` - `src/macro_lactone_toolkit/cli.py` Add the new result type in the existing models module instead of inventing a second schema location. ## Test Plan Add tests first, verify they fail, then implement. Required test cases: - Standard 12, 14, 16, and 20 membered macrolactones still classify as `standard_macrolactone` and return the correct `ring_size`. - A macrocycle with a valid lactone bond but a non-carbon atom at position `3..N` classifies as `non_standard_macrocycle` with: - `primary_reason_code == "contains_non_carbon_ring_atoms_outside_positions_1_2"` - the expected English message - An overlapping-candidate example classifies as `non_standard_macrocycle` with: - `primary_reason_code == "multiple_overlapping_macrocycle_candidates"` - the expected English message - A non-lactone macrocycle classifies as `not_macrolactone` with `no_lactone_ring_in_12_to_20_range`. - Explicit `ring_size` with no candidate of that size returns `not_macrolactone` with `requested_ring_size_not_found`. - `macro-lactone-toolkit analyze --smiles ...` returns the new fields for: - one standard example - one heteroatom-rejected example - one overlap-rejected example - Existing numbering, fragmentation, labeled/plain dummy round-trip, and splicing tests remain green for standard macrolactones. Test fixture guidance: - Reuse the existing synthetic macrocycle helper for standard rings. - Extend the helper or add a new fixture helper for: - a lactone-containing ring with one non-carbon atom at a numbered position beyond 2 - an overlapping-candidate ring example specifically built to share ring atoms between candidate rings ## Assumptions And Defaults - Classification is molecule-level, but the overlap rejection only applies to overlapping candidate rings, not disconnected candidates elsewhere in the molecule. - Invalid SMILES remain exceptions, not classification payloads. - `analyze` becomes the official classification output; `get_valid_ring_sizes()` may remain as a lower-level helper. - The implementation should stay aligned with RDKit ring APIs as candidate generators, not as the final definition of a standard macrolactone.