Keep key validation outputs and analysis tables tracked directly, package analysis plot PNGs into a small tar.gz backup, and add analysis scripts plus tests so the stored results remain reproducible without flooding git with large image trees.
48 lines
2.5 KiB
Plaintext
48 lines
2.5 KiB
Plaintext
Analyzed rows: 34829
|
|
Unique parent molecules: 4451
|
|
Unique fragment smiles: 1852
|
|
Fragment atom count percentiles: p05=1.0, p25=1.0, p50=1.0, p75=2.0, p95=14.0
|
|
|
|
Filter candidates (drop fragments with atom_count <= threshold):
|
|
<= 1: remove 23994 rows (68.9%), remove 10 unique fragments (0.5%)
|
|
<= 2: remove 28069 rows (80.6%), remove 26 unique fragments (1.4%)
|
|
<= 3: remove 28550 rows (82.0%), remove 52 unique fragments (2.8%)
|
|
<= 4: remove 29045 rows (83.4%), remove 88 unique fragments (4.8%)
|
|
<= 5: remove 29272 rows (84.0%), remove 141 unique fragments (7.6%)
|
|
|
|
Ring 16 rows: 8108
|
|
Ring 16 unique fragment smiles: 596
|
|
Ring 16 rows with >= 4 heavy atoms: 1880
|
|
Ring 16 unique fragment smiles with >= 4 heavy atoms: 566
|
|
|
|
Ring 16 top positions by normalized Shannon entropy:
|
|
Position 7: entropy=0.857, unique=4, mean_atom_count=2.57
|
|
Position 13: entropy=0.739, unique=198, mean_atom_count=15.50
|
|
Position 4: entropy=0.584, unique=70, mean_atom_count=6.89
|
|
Position 12: entropy=0.490, unique=99, mean_atom_count=3.63
|
|
Position 3: entropy=0.449, unique=121, mean_atom_count=5.10
|
|
|
|
Ring 16 top positions by mean pairwise Tanimoto distance:
|
|
Position 16: distance=0.901, entropy=0.415, atom_count_range=12
|
|
Position 10: distance=0.871, entropy=0.077, atom_count_range=13
|
|
Position 7: distance=0.860, entropy=0.857, atom_count_range=9
|
|
Position 14: distance=0.848, entropy=0.375, atom_count_range=13
|
|
Position 12: distance=0.839, entropy=0.490, atom_count_range=20
|
|
|
|
Ring 16 top filtered positions by normalized Shannon entropy:
|
|
Position 6: entropy=0.973, unique=60, total=89, mean_atom_count=12.58
|
|
Position 12: entropy=0.886, unique=83, total=177, mean_atom_count=10.00
|
|
Position 3: entropy=0.854, unique=117, total=269, mean_atom_count=15.41
|
|
Position 13: entropy=0.763, unique=193, total=709, mean_atom_count=18.91
|
|
Position 9: entropy=0.729, unique=37, total=141, mean_atom_count=7.82
|
|
|
|
Medicinal-chemistry hotspot comparison:
|
|
Position 6: all=536, >=4 atoms=89, unique_filtered=60, entropy_filtered=0.973
|
|
Position 7: all=23, >=4 atoms=4, unique_filtered=1, entropy_filtered=0.000
|
|
Position 15: all=747, >=4 atoms=205, unique_filtered=8, entropy_filtered=0.456
|
|
Position 16: all=135, >=4 atoms=5, unique_filtered=5, entropy_filtered=1.000
|
|
|
|
Interpretation note: atom-count spread is only a coarse proxy for diversity.
|
|
Use entropy and fingerprint distance as primary diversity evidence; use atom-count spread as supporting context.
|
|
For cyclic-side-chain sensitivity, see ring_sensitivity output and the markdown report.
|