Files
scholarly-writing-workbench/pending-calibration-tasks.md
mm644706215 40651842db first commit
2026-03-10 14:51:29 +08:00

4.2 KiB

Pending Calibration Tasks

This file tracks the remaining “last-mile” items that should be calibrated through real workflow runs.

These items do not block progress, but they should be refined over the next few iterations and then folded back into the main workflow template.


1. How to practice the first batch of citation evidence files

Start with a small pilot instead of a full rollout.

Recommended pilot scope:

  • 1 chapter
  • 2 to 3 key claims
  • 1 to 2 core supporting papers per claim

Recommended starting targets:

  • Chapter 1 claims about resistance mechanisms or ribosome-binding sites
  • Chapter 5 entries such as Macformer, PKS Enumerator, or SIME

Minimal pilot flow:

  1. qmd_retriever retrieves evidence from the papers collection.
  2. Validate the passage in Docling markdown.
  3. If needed, inspect Docling JSON for structured field or block evidence.
  4. If the claim concerns implementation details, inspect GitHub README, docs, or examples.
  5. citation_checker recommends papers and insertion positions.
  6. citation_archivist writes a markdown evidence file under citation-evidence/.
  7. A human reviews whether the evidence file is strong enough to show the citation is not fabricated.

The purpose of the pilot is to answer:

  • Is the current citation evidence template sufficient?
  • Which Docling JSON fields are actually useful?
  • How much GitHub evidence is useful in this review workflow?

2. Items that still need calibration

2.1 QMD retrieval parameters

Current defaults:

  • top 8
  • threshold 0.45
  • default order: papers -> github -> notes

Still to validate:

  • whether 0.45 is too high or too low for this Chinese review corpus
  • whether top 8 is enough
  • whether some chapters need stricter or looser retrieval settings

2.2 Docling JSON field whitelist

Already decided:

  • keep both markdown + json

Still to validate:

  • which JSON fields are most useful for citation evidence files
  • whether page indices, block IDs, heading levels, or paragraph indices should always be preserved
  • whether a dedicated JSON field extraction script is needed

2.3 GitHub evidence acceptance rules

Already decided:

  • do not force GitHub inspection if the paper itself is already clear
  • if implementation details are unclear, check README / docs / examples first
  • only inspect source code or key config files if those higher-level materials remain insufficient

Still to validate:

  • when README alone is enough
  • when docs/examples are required
  • when source inspection is required to avoid overclaiming

2.4 Future granularity of paper_github_repo_map.csv

Already decided:

  • default granularity is one row per paper <-> repository mapping

Still to validate:

  • whether module-level or subdirectory-level mapping fields are needed later
  • or whether that detail should stay in the notes field only

2.5 Zotero automatic insertion on Word copies

Current default:

  • human final confirmation remains the default

Still to validate:

  • whether automatic insertion experiments should be allowed on Word copies
  • which sections or sentence types are safe candidates for that experiment

2.6 Failure recovery flow for Word edits

Already decided:

  • create a backup before editing
  • restore from backup if the edit fails or produces clearly bad output

Still to validate:

  • whether a dedicated recovery script is needed
  • whether backup naming should be standardized with timestamps
  • whether recovery should become a mandatory part of the writer receipt

Avoid finalizing everything at once. A safer order is:

  1. run the first citation-evidence pilot
  2. calibrate QMD thresholds and candidate counts
  3. refine the Docling JSON field whitelist
  4. refine how deeply GitHub evidence should go into source code
  5. only then decide whether Zotero automatic insertion should be tested on draft copies

4. Write-back policy

After each real workflow round, write the stable result back into one of:

  • multi-agent-review-workflow-template.md
  • codex-multi-agent-setup-notes.md
  • this file

Rule:

  • if the rule is stable, write it into the main template
  • if it is still experimental, keep it here