# Pending Calibration Tasks This file tracks the remaining “last-mile” items that should be calibrated through real workflow runs. These items do not block progress, but they should be refined over the next few iterations and then folded back into the main workflow template. --- ## 1. How to practice the first batch of citation evidence files Start with a small pilot instead of a full rollout. Recommended pilot scope: - 1 chapter - 2 to 3 key claims - 1 to 2 core supporting papers per claim Recommended starting targets: - Chapter 1 claims about resistance mechanisms or ribosome-binding sites - Chapter 5 entries such as `Macformer`, `PKS Enumerator`, or `SIME` Minimal pilot flow: 1. `qmd_retriever` retrieves evidence from the `papers` collection. 2. Validate the passage in Docling markdown. 3. If needed, inspect Docling JSON for structured field or block evidence. 4. If the claim concerns implementation details, inspect GitHub `README`, `docs`, or `examples`. 5. `citation_checker` recommends papers and insertion positions. 6. `citation_archivist` writes a markdown evidence file under `citation-evidence/`. 7. A human reviews whether the evidence file is strong enough to show the citation is not fabricated. The purpose of the pilot is to answer: - Is the current citation evidence template sufficient? - Which Docling JSON fields are actually useful? - How much GitHub evidence is useful in this review workflow? --- ## 2. Items that still need calibration ### 2.1 QMD retrieval parameters Current defaults: - `top 8` - threshold `0.45` - default order: `papers -> github -> notes` Still to validate: - whether `0.45` is too high or too low for this Chinese review corpus - whether `top 8` is enough - whether some chapters need stricter or looser retrieval settings ### 2.2 Docling JSON field whitelist Already decided: - keep both `markdown + json` Still to validate: - which JSON fields are most useful for citation evidence files - whether page indices, block IDs, heading levels, or paragraph indices should always be preserved - whether a dedicated JSON field extraction script is needed ### 2.3 GitHub evidence acceptance rules Already decided: - do not force GitHub inspection if the paper itself is already clear - if implementation details are unclear, check `README / docs / examples` first - only inspect source code or key config files if those higher-level materials remain insufficient Still to validate: - when `README` alone is enough - when `docs/examples` are required - when source inspection is required to avoid overclaiming ### 2.4 Future granularity of `paper_github_repo_map.csv` Already decided: - default granularity is one row per `paper <-> repository` mapping Still to validate: - whether module-level or subdirectory-level mapping fields are needed later - or whether that detail should stay in the `notes` field only ### 2.5 Zotero automatic insertion on Word copies Current default: - human final confirmation remains the default Still to validate: - whether automatic insertion experiments should be allowed on Word copies - which sections or sentence types are safe candidates for that experiment ### 2.6 Failure recovery flow for Word edits Already decided: - create a backup before editing - restore from backup if the edit fails or produces clearly bad output Still to validate: - whether a dedicated recovery script is needed - whether backup naming should be standardized with timestamps - whether recovery should become a mandatory part of the `writer` receipt --- ## 3. Recommended order for future convergence Avoid finalizing everything at once. A safer order is: 1. run the first citation-evidence pilot 2. calibrate QMD thresholds and candidate counts 3. refine the Docling JSON field whitelist 4. refine how deeply GitHub evidence should go into source code 5. only then decide whether Zotero automatic insertion should be tested on draft copies --- ## 4. Write-back policy After each real workflow round, write the stable result back into one of: - `multi-agent-review-workflow-template.md` - `codex-multi-agent-setup-notes.md` - this file Rule: - if the rule is stable, write it into the main template - if it is still experimental, keep it here