145 lines
4.2 KiB
Markdown
145 lines
4.2 KiB
Markdown
# Pending Calibration Tasks
|
|
|
|
This file tracks the remaining “last-mile” items that should be calibrated through real workflow runs.
|
|
|
|
These items do not block progress, but they should be refined over the next few iterations and then folded back into the main workflow template.
|
|
|
|
---
|
|
|
|
## 1. How to practice the first batch of citation evidence files
|
|
|
|
Start with a small pilot instead of a full rollout.
|
|
|
|
Recommended pilot scope:
|
|
|
|
- 1 chapter
|
|
- 2 to 3 key claims
|
|
- 1 to 2 core supporting papers per claim
|
|
|
|
Recommended starting targets:
|
|
|
|
- Chapter 1 claims about resistance mechanisms or ribosome-binding sites
|
|
- Chapter 5 entries such as `Macformer`, `PKS Enumerator`, or `SIME`
|
|
|
|
Minimal pilot flow:
|
|
|
|
1. `qmd_retriever` retrieves evidence from the `papers` collection.
|
|
2. Validate the passage in Docling markdown.
|
|
3. If needed, inspect Docling JSON for structured field or block evidence.
|
|
4. If the claim concerns implementation details, inspect GitHub `README`, `docs`, or `examples`.
|
|
5. `citation_checker` recommends papers and insertion positions.
|
|
6. `citation_archivist` writes a markdown evidence file under `citation-evidence/`.
|
|
7. A human reviews whether the evidence file is strong enough to show the citation is not fabricated.
|
|
|
|
The purpose of the pilot is to answer:
|
|
|
|
- Is the current citation evidence template sufficient?
|
|
- Which Docling JSON fields are actually useful?
|
|
- How much GitHub evidence is useful in this review workflow?
|
|
|
|
---
|
|
|
|
## 2. Items that still need calibration
|
|
|
|
### 2.1 QMD retrieval parameters
|
|
|
|
Current defaults:
|
|
|
|
- `top 8`
|
|
- threshold `0.45`
|
|
- default order: `papers -> github -> notes`
|
|
|
|
Still to validate:
|
|
|
|
- whether `0.45` is too high or too low for this Chinese review corpus
|
|
- whether `top 8` is enough
|
|
- whether some chapters need stricter or looser retrieval settings
|
|
|
|
### 2.2 Docling JSON field whitelist
|
|
|
|
Already decided:
|
|
|
|
- keep both `markdown + json`
|
|
|
|
Still to validate:
|
|
|
|
- which JSON fields are most useful for citation evidence files
|
|
- whether page indices, block IDs, heading levels, or paragraph indices should always be preserved
|
|
- whether a dedicated JSON field extraction script is needed
|
|
|
|
### 2.3 GitHub evidence acceptance rules
|
|
|
|
Already decided:
|
|
|
|
- do not force GitHub inspection if the paper itself is already clear
|
|
- if implementation details are unclear, check `README / docs / examples` first
|
|
- only inspect source code or key config files if those higher-level materials remain insufficient
|
|
|
|
Still to validate:
|
|
|
|
- when `README` alone is enough
|
|
- when `docs/examples` are required
|
|
- when source inspection is required to avoid overclaiming
|
|
|
|
### 2.4 Future granularity of `paper_github_repo_map.csv`
|
|
|
|
Already decided:
|
|
|
|
- default granularity is one row per `paper <-> repository` mapping
|
|
|
|
Still to validate:
|
|
|
|
- whether module-level or subdirectory-level mapping fields are needed later
|
|
- or whether that detail should stay in the `notes` field only
|
|
|
|
### 2.5 Zotero automatic insertion on Word copies
|
|
|
|
Current default:
|
|
|
|
- human final confirmation remains the default
|
|
|
|
Still to validate:
|
|
|
|
- whether automatic insertion experiments should be allowed on Word copies
|
|
- which sections or sentence types are safe candidates for that experiment
|
|
|
|
### 2.6 Failure recovery flow for Word edits
|
|
|
|
Already decided:
|
|
|
|
- create a backup before editing
|
|
- restore from backup if the edit fails or produces clearly bad output
|
|
|
|
Still to validate:
|
|
|
|
- whether a dedicated recovery script is needed
|
|
- whether backup naming should be standardized with timestamps
|
|
- whether recovery should become a mandatory part of the `writer` receipt
|
|
|
|
---
|
|
|
|
## 3. Recommended order for future convergence
|
|
|
|
Avoid finalizing everything at once. A safer order is:
|
|
|
|
1. run the first citation-evidence pilot
|
|
2. calibrate QMD thresholds and candidate counts
|
|
3. refine the Docling JSON field whitelist
|
|
4. refine how deeply GitHub evidence should go into source code
|
|
5. only then decide whether Zotero automatic insertion should be tested on draft copies
|
|
|
|
---
|
|
|
|
## 4. Write-back policy
|
|
|
|
After each real workflow round, write the stable result back into one of:
|
|
|
|
- `multi-agent-review-workflow-template.md`
|
|
- `codex-multi-agent-setup-notes.md`
|
|
- this file
|
|
|
|
Rule:
|
|
|
|
- if the rule is stable, write it into the main template
|
|
- if it is still experimental, keep it here
|