first commit

2026-03-10 14:51:29 +08:00
commit 40651842db
7 changed files with 1956 additions and 0 deletions
--- a/pending-calibration-tasks.md
+++ b/pending-calibration-tasks.md
@@ -0,0 +1,144 @@
+# Pending Calibration Tasks
+
+This file tracks the remaining “last-mile” items that should be calibrated through real workflow runs.
+
+These items do not block progress, but they should be refined over the next few iterations and then folded back into the main workflow template.
+
+---
+
+## 1. How to practice the first batch of citation evidence files
+
+Start with a small pilot instead of a full rollout.
+
+Recommended pilot scope:
+
+- 1 chapter
+- 2 to 3 key claims
+- 1 to 2 core supporting papers per claim
+
+Recommended starting targets:
+
+- Chapter 1 claims about resistance mechanisms or ribosome-binding sites
+- Chapter 5 entries such as `Macformer`, `PKS Enumerator`, or `SIME`
+
+Minimal pilot flow:
+
+1. `qmd_retriever` retrieves evidence from the `papers` collection.
+2. Validate the passage in Docling markdown.
+3. If needed, inspect Docling JSON for structured field or block evidence.
+4. If the claim concerns implementation details, inspect GitHub `README`, `docs`, or `examples`.
+5. `citation_checker` recommends papers and insertion positions.
+6. `citation_archivist` writes a markdown evidence file under `citation-evidence/`.
+7. A human reviews whether the evidence file is strong enough to show the citation is not fabricated.
+
+The purpose of the pilot is to answer:
+
+- Is the current citation evidence template sufficient?
+- Which Docling JSON fields are actually useful?
+- How much GitHub evidence is useful in this review workflow?
+
+---
+
+## 2. Items that still need calibration
+
+### 2.1 QMD retrieval parameters
+
+Current defaults:
+
+- `top 8`
+- threshold `0.45`
+- default order: `papers -> github -> notes`
+
+Still to validate:
+
+- whether `0.45` is too high or too low for this Chinese review corpus
+- whether `top 8` is enough
+- whether some chapters need stricter or looser retrieval settings
+
+### 2.2 Docling JSON field whitelist
+
+Already decided:
+
+- keep both `markdown + json`
+
+Still to validate:
+
+- which JSON fields are most useful for citation evidence files
+- whether page indices, block IDs, heading levels, or paragraph indices should always be preserved
+- whether a dedicated JSON field extraction script is needed
+
+### 2.3 GitHub evidence acceptance rules
+
+Already decided:
+
+- do not force GitHub inspection if the paper itself is already clear
+- if implementation details are unclear, check `README / docs / examples` first
+- only inspect source code or key config files if those higher-level materials remain insufficient
+
+Still to validate:
+
+- when `README` alone is enough
+- when `docs/examples` are required
+- when source inspection is required to avoid overclaiming
+
+### 2.4 Future granularity of `paper_github_repo_map.csv`
+
+Already decided:
+
+- default granularity is one row per `paper <-> repository` mapping
+
+Still to validate:
+
+- whether module-level or subdirectory-level mapping fields are needed later
+- or whether that detail should stay in the `notes` field only
+
+### 2.5 Zotero automatic insertion on Word copies
+
+Current default:
+
+- human final confirmation remains the default
+
+Still to validate:
+
+- whether automatic insertion experiments should be allowed on Word copies
+- which sections or sentence types are safe candidates for that experiment
+
+### 2.6 Failure recovery flow for Word edits
+
+Already decided:
+
+- create a backup before editing
+- restore from backup if the edit fails or produces clearly bad output
+
+Still to validate:
+
+- whether a dedicated recovery script is needed
+- whether backup naming should be standardized with timestamps
+- whether recovery should become a mandatory part of the `writer` receipt
+
+---
+
+## 3. Recommended order for future convergence
+
+Avoid finalizing everything at once. A safer order is:
+
+1. run the first citation-evidence pilot
+2. calibrate QMD thresholds and candidate counts
+3. refine the Docling JSON field whitelist
+4. refine how deeply GitHub evidence should go into source code
+5. only then decide whether Zotero automatic insertion should be tested on draft copies
+
+---
+
+## 4. Write-back policy
+
+After each real workflow round, write the stable result back into one of:
+
+- `multi-agent-review-workflow-template.md`
+- `codex-multi-agent-setup-notes.md`
+- this file
+
+Rule:
+
+- if the rule is stable, write it into the main template
+- if it is still experimental, keep it here