Files
lingyuzeng 8071a141ee feat(validation): archive key result assets
Keep key validation outputs and analysis tables tracked directly,
package analysis plot PNGs into a small tar.gz backup, and add
analysis scripts plus tests so the stored results remain
reproducible without flooding git with large image trees.
2026-03-19 21:34:27 +08:00

48 lines
2.5 KiB
Plaintext

Analyzed rows: 34829
Unique parent molecules: 4451
Unique fragment smiles: 1852
Fragment atom count percentiles: p05=1.0, p25=1.0, p50=1.0, p75=2.0, p95=14.0
Filter candidates (drop fragments with atom_count <= threshold):
<= 1: remove 23994 rows (68.9%), remove 10 unique fragments (0.5%)
<= 2: remove 28069 rows (80.6%), remove 26 unique fragments (1.4%)
<= 3: remove 28550 rows (82.0%), remove 52 unique fragments (2.8%)
<= 4: remove 29045 rows (83.4%), remove 88 unique fragments (4.8%)
<= 5: remove 29272 rows (84.0%), remove 141 unique fragments (7.6%)
Ring 16 rows: 8108
Ring 16 unique fragment smiles: 596
Ring 16 rows with >= 4 heavy atoms: 1880
Ring 16 unique fragment smiles with >= 4 heavy atoms: 566
Ring 16 top positions by normalized Shannon entropy:
Position 7: entropy=0.857, unique=4, mean_atom_count=2.57
Position 13: entropy=0.739, unique=198, mean_atom_count=15.50
Position 4: entropy=0.584, unique=70, mean_atom_count=6.89
Position 12: entropy=0.490, unique=99, mean_atom_count=3.63
Position 3: entropy=0.449, unique=121, mean_atom_count=5.10
Ring 16 top positions by mean pairwise Tanimoto distance:
Position 16: distance=0.901, entropy=0.415, atom_count_range=12
Position 10: distance=0.871, entropy=0.077, atom_count_range=13
Position 7: distance=0.860, entropy=0.857, atom_count_range=9
Position 14: distance=0.848, entropy=0.375, atom_count_range=13
Position 12: distance=0.839, entropy=0.490, atom_count_range=20
Ring 16 top filtered positions by normalized Shannon entropy:
Position 6: entropy=0.973, unique=60, total=89, mean_atom_count=12.58
Position 12: entropy=0.886, unique=83, total=177, mean_atom_count=10.00
Position 3: entropy=0.854, unique=117, total=269, mean_atom_count=15.41
Position 13: entropy=0.763, unique=193, total=709, mean_atom_count=18.91
Position 9: entropy=0.729, unique=37, total=141, mean_atom_count=7.82
Medicinal-chemistry hotspot comparison:
Position 6: all=536, >=4 atoms=89, unique_filtered=60, entropy_filtered=0.973
Position 7: all=23, >=4 atoms=4, unique_filtered=1, entropy_filtered=0.000
Position 15: all=747, >=4 atoms=205, unique_filtered=8, entropy_filtered=0.456
Position 16: all=135, >=4 atoms=5, unique_filtered=5, entropy_filtered=1.000
Interpretation note: atom-count spread is only a coarse proxy for diversity.
Use entropy and fingerprint distance as primary diversity evidence; use atom-count spread as supporting context.
For cyclic-side-chain sensitivity, see ring_sensitivity output and the markdown report.