commit 40651842dbd9b7c9b3337f2d7da7afaba9ef3c7b Author: mm644706215 Date: Tue Mar 10 14:51:29 2026 +0800 first commit diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..90f0c27 --- /dev/null +++ b/.gitignore @@ -0,0 +1,19 @@ +# Large local reference corpora: keep local by default. +paper/ +文献/ +root/ + +# Track the English working draft, ignore other docx files by default. +*.docx +!macrolide-review-draft.docx + +# Office temporary files +~$*.docx +~$*.doc +*.tmp +*.bak +*.log + +# OS files +.DS_Store +Thumbs.db diff --git a/README.md b/README.md new file mode 100644 index 0000000..193c6cf --- /dev/null +++ b/README.md @@ -0,0 +1,306 @@ +# scholarly-writing-workbench + +This directory is intended to be a reusable workbench for paper writing and literature review workflows. + +The goal is not to store every raw reference asset in Git. The goal is to version: + +- reusable prompt templates +- Codex multi-agent setup notes +- the working Word draft +- small workflow documents, checklists, and mapping tables + +Large local reference corpora should stay outside normal Git tracking by default, including: + +- `paper/` +- the localized reference folder +- `root/` + +Those directories are already ignored in [`.gitignore`](./.gitignore). If you later decide to version part of them, prefer a small curated subset or Git LFS instead of pushing a full local corpus. + +--- + +## 1. What this repository should contain + +Recommended for version control: + +- `README.md` +- `multi-agent-review-workflow-template.md` +- `codex-multi-agent-setup-notes.md` +- `review-outline.md` +- `pending-calibration-tasks.md` +- `macrolide-review-draft.docx` +- small task lists, mapping tables, and workflow notes + +Not recommended for normal Git tracking: + +- full PDF corpora +- full Zotero `storage/` +- large GitHub source snapshots +- large Docling parse outputs +- large QMD index databases + +--- + +## 2. Required skills + +This workflow depends on these `Deep-Research-skills` skills: + +- `research` +- `research-add-fields` +- `research-add-items` +- `research-deep` +- `research-report` + +Useful supporting skills: + +- `skill-installer` +- `skill-creator` + +On a fresh machine, confirm they are installed under: + +- Windows: + - `C:\Users\\.codex\skills` +- macOS: + - `/Users//.codex/skills` +- Linux: + - `/home//.codex/skills` + +--- + +## 3. Required MCP servers + +Core MCP servers: + +- `word-mcp` + - edits `.docx` +- `zotero` + - manages items, PDFs, and citations +- `docling` + - parses PDFs +- `qmd` + - performs local hybrid retrieval + +Strongly recommended: + +- `chrome-devtools` + - opens paper pages and GitHub pages for manual downloading + +--- + +## 4. MCP installation notes + +### 4.1 `word-mcp` + +Recommended approach: clone locally and run with `pixi`. + +Example: + +```powershell +codex mcp add word-mcp -- pixi run --manifest-path C:/path/to/word-mcp start +``` + +### 4.2 `zotero` + +Recommended approach: clone locally, run with `pixi`, and configure: + +- `ZOTERO_API_KEY` +- `ZOTERO_USER_ID` +- `UNPAYWALL_EMAIL` + +Example: + +```powershell +codex mcp add zotero ` + --env ZOTERO_API_KEY=YOUR_KEY ` + --env ZOTERO_USER_ID=YOUR_USER_ID ` + --env UNPAYWALL_EMAIL=YOUR_EMAIL ` + -- pixi run --manifest-path C:/path/to/mcp-zotero start +``` + +### 4.3 `docling` + +Recommended approach: clone locally and run with `uv`. + +Example: + +```powershell +codex mcp add docling -- uv run --directory C:/path/to/docling-mcp docling-mcp-server --transport stdio +``` + +### 4.4 `qmd` + +Recommended requirements: + +- Node.js >= 22 +- either a stable local build or a verified global installation + +Example: + +```powershell +codex mcp add qmd -- node C:/path/to/qmd/dist/qmd.js mcp +``` + +### 4.5 `chrome-devtools` + +Windows example: + +```powershell +codex mcp add chrome-devtools -- npx chrome-devtools-mcp@latest +``` + +On Windows 11, you may also need `SystemRoot`, `PROGRAMFILES`, and a larger `startup_timeout_ms` in `~/.codex/config.toml`. + +macOS and Linux usually do not need the Windows-specific environment block. + +--- + +## 5. Codex multi-agent requirements + +At minimum, `~/.codex/config.toml` should contain: + +```toml +[features] +multi_agent = true +``` + +Recommended additions: + +```toml +[agents] +max_threads = 6 +max_depth = 1 +``` + +Recommended agents for this workflow: + +- `deep_researcher` +- `zotero_locator` +- `qmd_retriever` +- `github_mapper` +- `writer` +- `citation_checker` +- `citation_archivist` +- `web_researcher` + +See: + +- [`codex-multi-agent-setup-notes.md`](./codex-multi-agent-setup-notes.md) + +--- + +## 6. Recommended dynamic variables + +To keep this workflow portable across Windows, macOS, and Linux, maintain key paths as variables instead of hard-coding them everywhere. + +Recommended variables: + +- `REVIEW_TITLE` +- `WORK_ROOT` +- `WORD_TARGET_DOC` +- `REVIEW_OUTLINE_FILE` +- `ANALYSIS_DIR` +- `ZOTERO_ROOT` +- `ZOTERO_STORAGE_DIR` +- `PAPER_GITHUB_MAP_CSV` +- `RAG_ROOT` +- `RAG_PARSED_MD_DIR` +- `GITHUB_SOURCE_DIR` +- `GITHUB_MD_DIR` +- `CITATION_EVIDENCE_DIR` + +See: + +- [`multi-agent-review-workflow-template.md`](./multi-agent-review-workflow-template.md) + +--- + +## 7. Core workflow + +Recommended execution order: + +1. `Deep-Research-skills` + - identify missing papers, missing GitHub projects, and missing evidence +2. `zotero` + - check whether items and PDFs already exist +3. `chrome-devtools` + - open download pages so the user can manually download PDFs, supplements, or GitHub assets +4. `docling` + - convert PDFs into structured text +5. `qmd` + - perform hybrid retrieval over parsed papers, GitHub docs, and notes +6. `word-mcp` + - revise the Word draft only after evidence is ready +7. `zotero` + - support citation insertion and citation review + +--- + +## 8. Paper-to-GitHub mapping + +Maintain both: + +- Zotero source traces +- a local structured mapping table + +Recommended mapping table: + +- `paper_github_repo_map.csv` + +Rules: + +- one paper to multiple repositories: one row per mapping +- one repository to multiple papers: one row per mapping +- QMD is the primary retrieval layer +- Zotero is the source-trace layer + +--- + +## 9. Minimal checklist for a fresh machine + +1. Install Codex CLI +2. Install and verify `Deep-Research-skills` +3. Install and verify: + - `word-mcp` + - `zotero` + - `docling` + - `qmd` + - `chrome-devtools` +4. Confirm `~/.codex/config.toml` enables: + - `multi_agent = true` +5. Confirm `~/.codex/agents/` contains the required agent configs +6. Confirm Zotero local data or Zotero Web API access is available +7. Confirm QMD collections, Docling output directories, and mapping files are configured +8. Confirm the Word draft and outline file paths are updated + +--- + +## 10. Git initialization and push + +Initialize locally: + +```powershell +git init -b main +git add README.md .gitignore *.md *.docx +git commit -m "Initialize scholarly writing workbench" +``` + +After creating the remote repository in Gitea: + +```powershell +git remote add origin +git push -u origin main +``` + +--- + +## 11. Recommended repository name + +Preferred: + +- `scholarly-writing-workbench` + +Alternatives: + +- `paper-review-agent-workbench` +- `scholarly-writing-agent-kit` +- `literature-review-agent-kit` diff --git a/codex-multi-agent-setup-notes.md b/codex-multi-agent-setup-notes.md new file mode 100644 index 0000000..ce8b9b2 --- /dev/null +++ b/codex-multi-agent-setup-notes.md @@ -0,0 +1,343 @@ +# Codex Multi-Agent Setup Notes + +This note records how to enable and configure Codex multi-agent support for the literature review workflow. + +--- + +## 1. Current status + +Codex multi-agent support is enabled through: + +- `C:\Users\pylyz\.codex\config.toml` + +Key setting: + +```toml +[features] +multi_agent = true +``` + +The current environment also includes review-specific agent role definitions. + +--- + +## 2. Defined agent roles + +Agent config directory: + +- `C:\Users\pylyz\.codex\agents` + +Current roles: + +- `deep_researcher` + - identifies literature gaps, candidate papers, and candidate GitHub projects +- `zotero_locator` + - locates Zotero items, DOIs, attachments, and PDF paths +- `qmd_retriever` + - retrieves evidence from parsed papers, GitHub docs, and notes +- `github_mapper` + - maintains paper-to-GitHub mappings +- `writer` + - edits the Word draft only after evidence is ready +- `citation_checker` + - checks claim-to-paper matching and citation placement +- `citation_archivist` + - archives auditable evidence files for planned citations +- `web_researcher` + - performs live web research + +Recommended main-thread role: + +- `supervisor` + - defines the task, delegates subtasks, collects evidence, and decides when to edit Word + +--- + +## 3. Relevant configuration files + +Global config: + +- `C:\Users\pylyz\.codex\config.toml` + +Agent configs: + +- `C:\Users\pylyz\.codex\agents\deep-researcher.toml` +- `C:\Users\pylyz\.codex\agents\zotero-locator.toml` +- `C:\Users\pylyz\.codex\agents\qmd-retriever.toml` +- `C:\Users\pylyz\.codex\agents\github-mapper.toml` +- `C:\Users\pylyz\.codex\agents\writer.toml` +- `C:\Users\pylyz\.codex\agents\citation-checker.toml` +- `C:\Users\pylyz\.codex\agents\citation-archivist.toml` +- `C:\Users\pylyz\.codex\agents\web-researcher.toml` + +Important local workflow files: + +- working draft: + - `macrolide-review-draft.docx` +- paper-to-GitHub mapping table: + - `paper_github_repo_map.csv` + +--- + +## 4. How to enable multi-agent + +### Option A: edit the config file directly + +Update: + +- `C:\Users\pylyz\.codex\config.toml` + +Make sure it includes: + +```toml +[features] +multi_agent = true + +[agents] +max_threads = 6 +max_depth = 1 +``` + +Then declare individual agents: + +```toml +[agents.deep_researcher] +description = "..." +config_file = "agents/deep-researcher.toml" +``` + +Repeat for other roles. + +### Option B: use the CLI experimental toggle if available + +Some Codex builds expose a CLI experimental toggle for multi-agent mode. If present, you can enable it there and restart the session. + +For this machine, the file-based configuration is already the main source of truth. + +--- + +## 5. Cross-platform notes + +This note was prepared on Windows 11. If you move the workflow to macOS or Linux, pay attention to the following differences. + +### 5.1 Config paths + +- Windows: + - `C:\Users\\.codex\config.toml` +- macOS: + - `/Users//.codex/config.toml` +- Linux: + - `/home//.codex/config.toml` + +### 5.2 Agent directory + +- Windows: + - `C:\Users\\.codex\agents` +- macOS: + - `/Users//.codex/agents` +- Linux: + - `/home//.codex/agents` + +### 5.3 Path syntax + +- Windows often uses backslashes, but forward slashes are usually safer in TOML and CLI arguments. +- macOS and Linux should use forward slashes. +- Do not copy Windows absolute paths directly into macOS or Linux configs. + +### 5.4 Shell and command differences + +- Windows commonly uses `powershell` or `cmd` +- macOS and Linux typically use `bash` or `zsh` + +Do not copy Windows-only patterns like: + +- `cmd /c` +- `set` +- `where` + +Typical Unix-like alternatives: + +- `bash -lc` +- `which` +- `export` + +### 5.5 MCP startup command differences + +Some MCP servers are cross-platform, but their launch commands differ. + +Example: Windows `chrome-devtools` config: + +```toml +[mcp_servers.chrome-devtools] +command = "cmd" +args = ["/c", "npx", "-y", "chrome-devtools-mcp@latest"] +env = { SystemRoot = "C:\\Windows", PROGRAMFILES = "C:\\Program Files" } +startup_timeout_ms = 20_000 +``` + +Typical macOS/Linux variant: + +```toml +[mcp_servers.chrome-devtools] +command = "npx" +args = ["-y", "chrome-devtools-mcp@latest"] +startup_timeout_ms = 20_000 +``` + +### 5.6 Environment variables + +Windows-only extras such as `SystemRoot` and `PROGRAMFILES` usually do not belong in macOS/Linux configs. + +Cross-platform variables such as API keys and email values are usually portable. + +### 5.7 Executable paths + +Some Windows setups use explicit executable paths, such as a pinned `node.exe`. + +On macOS/Linux, replace them with the correct local paths, or prefer portable commands when possible: + +- `uv` +- `uvx` +- `pixi` +- `python` +- `node` +- `npx` + +### 5.8 Windows-only sections + +Do not copy Windows-only sections such as: + +```toml +[windows] +sandbox = "unelevated" +``` + +### 5.9 Safe migration order + +If you reuse this workflow on macOS or Linux: + +1. copy the role structure +2. replace local paths +3. verify each MCP startup command +4. start a fresh Codex session and test + +--- + +## 6. How changes take effect + +1. Save `config.toml` +2. Save all relevant `agents/*.toml` +3. close the current Codex session +4. start a new Codex session +5. use the new session for delegation + +Multi-agent configuration is typically read at session startup, so a fresh session is the safer default after config changes. + +--- + +## 7. Recommended delegation pattern for this review + +Recommended parallel group 1: + +- `deep_researcher` +- `zotero_locator` + +Recommended parallel group 2: + +- `qmd_retriever` +- `github_mapper` + +Recommended serial steps: + +- `writer` +- `citation_checker` +- `citation_archivist` + +Reasoning: + +- search, retrieval, and mapping are naturally parallel +- Word editing should remain single-writer +- citation review and citation evidence archiving should happen after the content draft is stable + +--- + +## 8. Good parallel tasks + +- literature gap analysis +- Zotero existence checks +- DOI / author / year verification +- attachment path discovery +- QMD retrieval over papers +- QMD retrieval over GitHub docs and notes +- paper-to-GitHub mapping maintenance +- download list preparation + +## 9. Tasks that should not be written in parallel + +- multiple agents editing the same Word document +- multiple agents modifying the same mapping file without strict coordination +- drafting prose before evidence has converged + +Rule of thumb: + +- parallelize read-heavy work +- keep write-heavy work single-owner + +--- + +## 10. Minimal example prompt + +```text +Goal: strengthen Chapter 5 on macrolide-specific generation tools and scaffold-constrained optimization evidence. + +Use a multi-agent workflow: + +1. Run deep_researcher and zotero_locator in parallel. + - deep_researcher: identify missing papers, GitHub projects, and missing evidence + - zotero_locator: check whether those papers already exist in Zotero and whether PDFs are available + +2. Then run qmd_retriever and github_mapper in parallel. + - qmd_retriever: retrieve direct evidence from parsed papers, GitHub docs, and notes + - github_mapper: update paper_github_repo_map.csv and check whether Zotero already preserves source traces + +3. If evidence is sufficient, let writer propose or apply minimal Word edits. + +4. After that, let citation_checker review citation placement. + +5. Finally, let citation_archivist create citation evidence files for key claims. +``` + +--- + +## 11. Related tools + +- `word-mcp` + - Word `.docx` editing +- `zotero` + - item metadata, PDFs, and citations +- `docling` + - PDF parsing +- `qmd` + - hybrid local retrieval +- `chrome-devtools` + - browser opening for manual downloads +- `Deep-Research-skills` + - structured research workflows + +--- + +## 12. Maintenance habits + +1. back up `config.toml` before major changes +2. update `paper_github_repo_map.csv` before or alongside GitHub-related evidence work +3. back up the Word draft before important edits +4. use multi-agent mode for larger tasks, then converge on a single `writer` +5. archive evidence for important citations early instead of waiting until the end + +--- + +## 13. Possible future expansions + +- add a dedicated `mechanism_checker` for ribosome-binding and resistance corrections +- add a dedicated `download_planner` +- add a dedicated `reporter` for round summaries diff --git a/macrolide-review-draft.docx b/macrolide-review-draft.docx new file mode 100644 index 0000000..abfe46d Binary files /dev/null and b/macrolide-review-draft.docx differ diff --git a/multi-agent-review-workflow-template.md b/multi-agent-review-workflow-template.md new file mode 100644 index 0000000..e04631e --- /dev/null +++ b/multi-agent-review-workflow-template.md @@ -0,0 +1,1017 @@ +# Multi-Agent Review Workflow Template + +You are supporting the writing of a Chinese scholarly review with the following topic: + +**AI-Driven Design of 16-Membered Macrolides: From Traditional Antibiotic Optimization to Intelligent Molecular Generation** + +Your job is not to discard the current draft and rewrite everything from scratch. Your job is to continue the review by using existing PPT materials, PDFs, Zotero items, the Word draft, and prior analysis files to: + +- identify literature gaps +- strengthen evidence +- correct mechanism-level mistakes +- continue drafting where needed +- plan citation placement +- revise the Word manuscript + +Use the local toolchain below with clear role separation. + +--- + +## 0. Template variables and adjustable paths + +Treat this file as a reusable template rather than a one-off prompt bound to the current Windows machine. + +If the workflow moves to macOS or Linux, or if the topic, Word file, Zotero path, or GitHub source directory changes, update these variables first instead of rewriting the whole template. + +Core variables: + +- `REVIEW_TITLE` + - current value: `AI-Driven Design of 16-Membered Macrolides: From Traditional Antibiotic Optimization to Intelligent Molecular Generation` +- `WORKBENCH_DIR` + - current value: `` +- `WORK_ROOT` + - current value: `D:\phd\presentation` +- `REVIEW_OUTLINE_FILE` + - current value: `${WORKBENCH_DIR}/review-outline.md` +- `REVIEW_OUTLINE_FALLBACK_FILE` + - current value: `${WORKBENCH_DIR}/review-outline.generated.md` +- `WORD_TARGET_DOC` + - current value: `${WORKBENCH_DIR}/macrolide-review-draft.docx` +- `WORD_BACKUP_DIR` + - current value: `${WORKBENCH_DIR}/backups` +- `ANALYSIS_DIR` + - current value: `D:\phd\presentation\analysis` +- `ZOTERO_ROOT` + - current value: `C:\Users\pylyz\Zotero` +- `ZOTERO_SQLITE_PATH` + - current value: `C:\Users\pylyz\Zotero\zotero.sqlite` +- `ZOTERO_SQLITE_BAK_PATH` + - current value: `C:\Users\pylyz\Zotero\zotero.sqlite.bak` +- `ZOTERO_STORAGE_DIR` + - current value: `C:\Users\pylyz\Zotero\storage` +- `PAPER_GITHUB_MAP_CSV` + - current value: `D:\phd\presentation\analysis\paper_github_repo_map.csv` +- `RAG_ROOT` + - current value: `D:\phd\presentation\zotero-rag` +- `RAG_PDF_DIR` + - current value: `D:\phd\presentation\zotero-rag\pdf` +- `RAG_PARSED_MD_DIR` + - current value: `D:\phd\presentation\zotero-rag\parsed-md` +- `RAG_PARSED_JSON_DIR` + - current value: `D:\phd\presentation\zotero-rag\parsed-json` +- `GITHUB_SOURCE_DIR` + - current value: `D:\phd\presentation\zotero-rag\github-sources` +- `GITHUB_MD_DIR` + - current value: `D:\phd\presentation\zotero-rag\github-md` +- `CITATION_EVIDENCE_DIR` + - current value: `D:\phd\presentation\analysis\citation-evidence` + +Optional variables: + +- `PLATFORM_NAME` + - `windows` / `macos` / `linux` +- `QMD_PAPERS_COLLECTION` + - default: `papers` +- `QMD_GITHUB_COLLECTION` + - default: `github` +- `QMD_NOTES_COLLECTION` + - default: `notes` + +Portability notes: + +- Windows paths often use drive letters; macOS/Linux should use `/Users/...` or `/home/...` +- Avoid hard-coding Windows-only details such as `cmd /c` or `.exe` in the template body +- Keep a “variable name + current value” convention for easier migration + +If the review topic changes, the minimum variables to update are: + +- `REVIEW_TITLE` +- `REVIEW_OUTLINE_FILE` +- `WORD_TARGET_DOC` +- `WORK_ROOT` +- `ANALYSIS_DIR` +- `PAPER_GITHUB_MAP_CSV` +- `WORKBENCH_DIR` + +--- + +## 1. Available tools + +### 1.1 Deep-Research-skills + +Purpose: + +- research and discovery +- identify missing reviews, original papers, databases, method papers, and relevant GitHub tools +- output structured candidate lists + +Use cases: + +- identify which chapter lacks evidence +- identify which claims lack direct literature support +- find recent reviews and milestone original papers +- find GitHub implementations related to macrolides, macrocycle design, scaffold-constrained generation, antibiotic design, and molecular generation + +Not for: + +- direct Word editing +- replacing Zotero as the reference manager +- replacing PDF parsing + +### 1.2 `chrome-devtools` MCP + +Purpose: + +- open Chrome pages +- support access to paper landing pages, DOI pages, publisher pages, and GitHub pages +- help the user manually download PDFs, supplements, or code-related assets + +Use cases: + +- open paper pages for download +- open GitHub repositories, releases, README pages, and docs pages +- help confirm download links and attachment locations + +Default rule: + +- do not assume fully automatic downloading +- when login, institution access, captcha, or publisher restrictions exist, let the user handle the download manually + +### 1.3 `zotero` MCP + +Purpose: + +- manage Zotero items +- add papers by DOI +- import PDFs +- retrieve metadata +- retrieve full text when available +- support Word citation workflows + +Use cases: + +- check whether a paper is already in Zotero +- import downloaded PDFs +- inspect DOI, authors, year, item metadata, and attachment status +- support citation planning + +Not for: + +- large-scale prose editing +- primary management of GitHub implementation materials + +### 1.4 `word-mcp` + +Purpose: + +- read and modify `.docx` +- support backup-first editing +- perform local revisions on the review draft + +Use cases: + +- minimal necessary edits to the existing review draft +- paragraph-level additions backed by evidence +- mechanism corrections +- placeholders for citations + +Default target: + +- `WORD_TARGET_DOC` + +Editing rules: + +- back up before important edits +- preserve the chapter structure unless there is a strong reason to change it +- generate a backup first, then edit the working copy +- if the edit fails or is clearly wrong, restore from backup + +### 1.5 `docling` MCP + +Purpose: + +- parse local PDFs +- convert them into structured outputs +- generate intermediate outputs for retrieval and evidence validation + +Use cases: + +- convert papers into readable structured text +- feed QMD with better source material +- preserve section, paragraph, and table structure when possible + +Rule: + +- QMD should not work directly on raw PDFs +- parse with Docling first, then index the parsed outputs + +### 1.6 `qmd` MCP + +Purpose: + +- perform local hybrid retrieval over markdown or structured text +- combine BM25, vector search, and reranking +- serve as the unified retrieval layer across parsed papers, GitHub docs, and notes + +Use cases: + +- retrieve mechanisms, methods, and conclusions from papers +- retrieve README/docs/examples from GitHub sources +- search across papers, code documentation, and notes together + +Rules: + +- QMD should primarily index markdown or structured text rather than raw PDFs +- GitHub implementation materials should be managed in QMD, not primarily in Zotero +- any QMD result used for writing should be validated back against Docling outputs, original PDFs, or Zotero full text +- if a paper and its GitHub repo describe the same method consistently, confidence increases +- default retrieval order: `papers`, then `github`, then `notes` + +--- + +## 2. Recommended end-to-end workflow + +### Stage 1. Start with research, not with writing + +Before chapter-level work begins, identify the outline source: + +1. read `REVIEW_OUTLINE_FILE`, with the default name `review-outline.md` +2. if the outline file does not exist: + - extract a provisional outline from the Word draft based on headings, TOC structure, and visible chapter titles + - write it to `REVIEW_OUTLINE_FALLBACK_FILE` +3. bind each task round to a specific outline location instead of saying “continue writing” + +Use Deep-Research-skills to determine: + +1. missing key papers +2. weak evidence sections +3. GitHub projects worth including +4. which paper or project supports which chapter, subsection, or concrete claim + +Recommended candidate output fields: + +- title +- year +- DOI if available +- type: review / original paper / database / tool / GitHub project +- target chapter or subsection +- intended use +- whether a PDF, supplement, or GitHub asset still needs to be downloaded + +### Stage 2. Open pages and let the user download + +When Deep-Research-skills produces candidates: + +1. check Zotero first +2. if Zotero lacks a full item or lacks a PDF: + - use `chrome-devtools` to open the landing page, DOI page, publisher page, or GitHub page + - tell the user exactly what still needs to be downloaded + +Typical manual download targets: + +- paper PDF +- supplementary or supporting information +- appendices +- GitHub README / docs / release assets / examples + +### Stage 3. Import into Zotero or the local knowledge base + +#### 3.1 Papers + +Default policy: + +- papers, reviews, original studies, PDFs, and supplements should be managed in Zotero first +- Zotero remains the source of item metadata, DOI, authors, year, PDF, and citation linkage + +If the user has already downloaded a PDF: + +- import it into Zotero +- add DOI or repair metadata if needed + +#### 3.2 GitHub repositories and code materials + +GitHub repositories should not be handled only in Zotero, and they also should not be detached from Zotero completely. + +Default dual-track policy: + +- primary retrieval layer: `QMD` +- source-trace layer: `Zotero` + +Why: + +- Zotero is good at provenance, item grouping, and paper-to-repo relationships +- GitHub value is usually in README, docs, examples, issue discussions, and release notes +- the workflow needs both retrieval and traceability + +Default handling: + +- store GitHub materials in a dedicated local directory +- normalize README/docs/release notes/manual summaries into QMD collections +- also preserve a Zotero trace via a webpage item or linked URL attachment +- maintain a structured global mapping table + +Primary mapping table: + +- `PAPER_GITHUB_MAP_CSV` + +Minimum recommended fields: + +- `paper_title` +- `doi` +- `year` +- `zotero_item_key` +- `pdf_path` +- `github_repo_name` +- `github_repo_url` +- `github_local_qmd_path` +- `mapping_type` +- `section_usage` +- `evidence_scope` +- `notes` + +Default granularity: + +- one row per `paper <-> repository` mapping +- if a paper maps to multiple repositories, write multiple rows +- if later work truly requires module-level mapping, start by recording it inside `notes` rather than changing the main schema immediately + +### Stage 4. Parse with Docling and retrieve with QMD + +Recommended directory layout under `RAG_ROOT`: + +```text +RAG_ROOT/ + pdf/ + parsed-md/ + parsed-json/ + github-sources/ + github-md/ + qmd-workspace/ + scripts/ +``` + +Recommended flow: + +1. use Zotero or the sqlite helper to locate the PDF +2. copy or link the PDF into `RAG_PDF_DIR` +3. run Docling and write both markdown and JSON into: + - `RAG_PARSED_MD_DIR` + - `RAG_PARSED_JSON_DIR` +4. place GitHub README/docs/release notes/manual summaries into `GITHUB_MD_DIR` +5. create at least these QMD collections: + - `papers` + - `github` + - `notes` + +QMD evidence rules: + +1. default retrieval order: + - `papers` + - then `github` if evidence is still weak + - then `notes` if needed +2. default retrieval parameters: + - first-pass candidate count: `top 8` + - candidate threshold: `0.45` + - if fewer than `3` good `papers` results exceed `0.45`, expand to `github` +3. QMD hits are not final writing evidence until at least one of the following is true: + - the statement is visible in Docling markdown + - the statement is confirmed in the original PDF or Zotero full text +4. if the claim involves implementation details, check: + - `README` + - `docs` + - `examples` + - release notes + - then source code or config only if needed +5. if paper text, Docling output, and GitHub materials agree, treat the evidence as high-confidence +6. if only QMD returns a snippet but Docling or GitHub cannot support it, mark it as: + - needs review + - not ready for a definitive statement +7. GitHub evidence rules: + - do not force a GitHub check when the paper itself is already sufficiently clear + - if the paper is vague about implementation, check `README / docs / examples` first + - only inspect source code when higher-level documentation is still insufficient + - if source code appears to contradict the paper description, flag the conflict instead of writing a firm claim + +### Stage 4.5. Multi-agent orchestration + +When the task is large enough, prefer a multi-agent workflow instead of a fully serial one. + +Main thread role: + +- `supervisor` + +Main thread responsibilities: + +- define the current round goal +- split the work +- define the expected outputs +- merge evidence +- decide when Word editing is allowed + +Recommended child agents: + +- `deep_researcher` + - identify missing papers, GitHub projects, and evidence types +- `zotero_locator` + - locate items, DOIs, PDFs, notes, and linked URLs +- `qmd_retriever` + - retrieve evidence snippets and source paths +- `github_mapper` + - update the mapping table and confirm provenance traces +- `writer` + - revise the Word draft only after evidence is sufficient +- `citation_checker` + - verify claim-to-paper fit and citation placement +- `citation_archivist` + - create auditable citation evidence files + +Recommended parallel pairs: + +- `deep_researcher` + `zotero_locator` +- `qmd_retriever` + `github_mapper` + +Recommended serial sequence: + +- `writer` +- `citation_checker` +- `citation_archivist` + +Never allow multiple agents to edit the same Word draft simultaneously. + +Read-heavy tasks that are safe to parallelize: + +- literature-gap analysis +- Zotero item checks +- sqlite-based PDF discovery +- QMD retrieval +- GitHub page checks +- mapping-table updates +- citation evidence preparation + +Required child-agent output quality: + +- conclusions must include source paths +- Zotero outputs should include item keys, DOIs, and attachment paths when possible +- QMD outputs should include collection name, file path, and a short supporting snippet +- GitHub outputs should include repo URL, local QMD path, and mapping-table status +- citation evidence outputs should include evidence file path, original PDF path, and supporting passages or JSON fields + +### Stage 5. Edit the Word draft only after evidence is ready + +Default editing priorities: + +- factual corrections +- mechanism corrections +- evidence-deficient core paragraphs +- removal of redundant or repeated phrasing + +Rules: + +- back up before important edits +- prefer minimal necessary edits +- do not aggressively restructure the entire manuscript +- explicitly mark where citations are still missing + +Default edit granularity: + +- paragraph-level by default +- sentence-level only for very local factual corrections +- subsection-level only when evidence and structure are both sufficiently clear + +### Stage 6. Citation workflow + +Default policy: + +1. propose which paper supports which sentence +2. let `citation_checker` review citation fit and placement +3. let `citation_archivist` create an evidence file under `CITATION_EVIDENCE_DIR` +4. let the user insert the Zotero citation manually in Word +5. then review whether the placement is correct + +Human final confirmation remains the default for Zotero insertion. + +--- + +## 3. Local project background + +### 3.1 Work root + +- `WORK_ROOT` + +### 3.2 Primary editable document + +- `WORD_TARGET_DOC` + +### 3.3 Current workbench directory + +- `WORKBENCH_DIR` + +### 3.4 Zotero-related paths + +- root: + - `ZOTERO_ROOT` +- main sqlite: + - `ZOTERO_SQLITE_PATH` +- backup sqlite: + - `ZOTERO_SQLITE_BAK_PATH` +- attachment directory: + - `ZOTERO_STORAGE_DIR` +- paper-to-GitHub mapping: + - `PAPER_GITHUB_MAP_CSV` +- citation evidence directory: + - `CITATION_EVIDENCE_DIR` + +### 3.5 Local RAG paths + +- root: + - `RAG_ROOT` +- parsed markdown: + - `RAG_PARSED_MD_DIR` +- parsed JSON: + - `RAG_PARSED_JSON_DIR` +- GitHub markdown: + - `GITHUB_MD_DIR` + +--- + +## 4. Writing constraints + +### 4.1 Do not overstate consensus + +Do not present the following whole framework as if the literature had already established it as a unified consensus: + +- fixed scaffold +- site-controlled generation +- stitching +- multi-objective scoring + +Many studies only cover part of this picture. + +### 4.2 Distinctions that must be preserved + +Do not automatically equate: + +- macrocycles with 16-membered macrolides +- macrocyclic peptides or macrocyclic oligoamides with macrolides +- antibacterial molecule generation with macrolide generation +- scaffold-constrained generation with proven fixed-scaffold macrolide optimization + +### 4.3 Sections that are currently evidence-poor + +- Chapter 2: + - structural basis, ribosome binding, resistance mechanisms, SAR +- Chapter 3: + - direct docking / MD / QM/MM cases in macrolide optimization +- Chapter 5: + - truly macrolide-oriented generation tools and fixed-scaffold optimization evidence + +--- + +## 5. Mechanism-level correction rules + +### 5.1 Ribosome site wording + +Do not write: + +- “30S small subunit A2058 site” + +Preferred framing: + +- 50S subunit +- 23S rRNA +- NPET +- A2058 / A2059 and related sites + +### 5.2 Do not merge L4/L22 with rRNA sites + +Do not describe A2058 / A2059 as if they were residues on L4 or L22. + +Preferred wording: + +- the L4/L22 loops help form the NPET constriction region +- A2058, A2059, A752, U2609, and C2610 are 23S rRNA sites + +### 5.3 Avoid “complete translation blockage” + +Safer wording: + +- partially occlude / constrict the NPET +- context-specific translation inhibition +- some nascent chains can still pass + +--- + +## 6. User writing preferences + +- keep the overall structure mostly stable +- preserve the existing chapter layout +- prioritize factual corrections +- lightly compress repetition +- add evidence and paragraphs only where needed + +If a section is acceptable, it is valid to mark it as: + +- keep as is +- defer rewrite +- only add citations later + +--- + +## 7. Priority references + +- Macformer +- SyntheMol +- MDAGS +- Mordred / MacrolactoneDB / Mordred_mrc +- Expansive discovery of chemically diverse structured macrocyclic oligoamides +- DiffGui +- 16-membered macrolide antibiotics: a review +- How Macrolide Antibiotics Work +- Modifications and Biological Activity of Natural and Semisynthetic 16-Membered Macrolide Antibiotics +- DrugEx v3 +- FFLOM +- Deep Generative Models for 3D Linker Design +- TamGen + +--- + +## 8. GitHub evidence management + +Default policy: + +- do not treat the repository itself as a primary Zotero object +- store GitHub materials in local directories for QMD retrieval +- preserve a Zotero trace and maintain the mapping table + +Recommended local repository layout: + +```text +GITHUB_SOURCE_DIR/ + repo-name-1/ + README.md + docs/ + release-notes.md + notes.md + repo-name-2/ + README.md + docs/ + notes.md +``` + +Recommended steps: + +1. open the GitHub page with `chrome-devtools` +2. let the user decide whether to save source snapshots, release assets, README, or docs +3. normalize useful content into markdown +4. add it to the `github` QMD collection +5. update `PAPER_GITHUB_MAP_CSV` +6. keep a Zotero provenance trace + +--- + +## 9. Standard execution order per round + +### 9.1 Read the relevant context first + +Before substantive work, read only the most relevant local materials for the current round. + +### 9.2 Identify the task type + +Common round types: + +- mechanism correction +- literature supplementation +- chapter strengthening +- integrating existing materials +- GitHub evidence supplementation +- redundancy compression + +### 9.3 If literature is missing, research first + +1. use Deep-Research-skills +2. check Zotero +3. if missing, open pages and produce a download list + +### 9.4 If files exist locally, parse and retrieve + +1. keep papers in Zotero first +2. parse PDFs with Docling +3. move GitHub materials into QMD +4. retrieve with QMD and then validate back against Docling/PDF/GitHub as needed + +### 9.5 If the task is large, split it into agents + +Recommended role split: + +1. `deep_researcher` +2. `zotero_locator` +3. `qmd_retriever` +4. `github_mapper` +5. `writer` +6. `citation_checker` +7. `citation_archivist` + +### 9.6 Edit Word last + +1. back up +2. edit the working copy +3. produce a revision receipt + +--- + +## 10. Standard round output + +Every round should be organized roughly as: + +### 10.1 Goal + +What the round is trying to resolve. + +### 10.2 Materials read + +Which local files, Zotero items, QMD hits, or GitHub pages were actually used. + +### 10.3 Problems found + +Typical categories: + +- factual errors +- mechanism errors +- weak evidence +- missing literature +- missing GitHub project evidence +- unstable phrasing +- duplication + +### 10.4 Recommended actions + +Split into: + +- can change immediately +- needs more evidence first +- needs user confirmation + +### 10.5 If a download task exists + +List: + +- papers to download +- supplements to download +- GitHub pages/releases/docs to open +- mapping-table rows that need to be updated afterward + +### 10.6 If citation work is involved + +List: + +- recommended papers +- recommended insertion positions +- which sentence each citation supports +- what the user should verify after manual insertion + +--- + +## 11. Default coordination rules + +1. research first, then download, then import, then parse, then retrieve, then edit Word +2. papers are primarily managed through Zotero +3. GitHub implementation materials are primarily managed through QMD +4. PDFs must go through Docling before QMD retrieval +5. human final confirmation remains the default for citations +6. important new claims should always carry explicit evidence when possible +7. if evidence is weak, prefer conservative wording over inflated conclusions + +--- + +## 12. Task input template + +Use the following structure when starting a new round: + +```text +Round title: + +Review title: +${REVIEW_TITLE} + +Target chapter / subsection: + +Outline position: +- chapter: +- section: +- subsection: + +Outline file: +- primary: `${REVIEW_OUTLINE_FILE}` +- if missing: `${REVIEW_OUTLINE_FALLBACK_FILE}` + +Task type: +- [ ] research only +- [ ] research + download list +- [ ] research + Zotero / QMD / GitHub alignment +- [ ] evidence collection followed by Word edits +- [ ] citation review only + +Allowed tools this round: +- [ ] Deep-Research-skills +- [ ] chrome-devtools +- [ ] zotero +- [ ] docling +- [ ] qmd +- [ ] word-mcp + +Is web access allowed? +- [ ] yes +- [ ] no + +Is Word editing allowed this round? +- [ ] yes +- [ ] no + +Priority local materials: + +Key questions: +1. +2. +3. + +Expected outputs: +- [ ] missing literature list +- [ ] download list +- [ ] Zotero item check +- [ ] QMD evidence summary +- [ ] paper-to-GitHub mapping update +- [ ] Word revision plan +- [ ] actual Word edits +- [ ] citation recommendation +- [ ] citation evidence archive +``` + +If the outline location is unclear, clarify it before large-scale drafting. + +--- + +## 13. Download-list template + +```text +Download task list + +1. Paper title: + DOI: + Asset type: PDF / supplementary / appendix / dataset / release asset + Why it is needed: + Target chapter: + Open page: + +2. GitHub project: + repo_url: + Suggested saved content: README / docs / examples / release assets / source snapshot + Target chapter: + Local destination after download: + +3. Follow-up after download: + - import into Zotero? + - update paper_github_repo_map.csv? + - send into Docling / QMD? +``` + +--- + +## 14. Word revision receipt template + +```text +Word revision receipt + +Target document: +Backup file: + +Revised locations: +1. +2. +3. + +Summary of changes: +1. +2. +3. + +Evidence basis: +1. paper / Zotero item / QMD hit / GitHub documentation +2. +3. + +Unresolved issues: +1. +2. + +User actions still needed: +1. Zotero citation insertion +2. PDF download +3. GitHub provenance confirmation +``` + +--- + +## 15. Citation review template + +```text +Citation review sheet + +Sentence or paragraph: + +Recommended papers: +1. +2. + +Reasons: +1. +2. + +Suggested insertion position: + +Risk notes: +- does one paper fail to support the whole sentence? +- should the sentence be split across multiple citations? +- is the source only background context rather than direct evidence? +- does the user still need to manually confirm the citation in Word? +``` + +--- + +## 16. Citation evidence archive template + +For each key citation or tight cluster of citations, create one markdown evidence file under `CITATION_EVIDENCE_DIR`. + +Suggested filename: + +- `chapter-section-claim-shortname.md` + +Suggested contents: + +```text +Citation evidence archive + +Target chapter: +Target paragraph: +Target sentence / claim: + +Recommended paper: +DOI: +Zotero item key: + +Original PDF absolute path: + +Docling markdown support: +- file: +- supporting passage: + +Docling JSON support: +- file: +- field / page / block identifier: + +Zotero / sqlite supporting metadata: +- item key: +- attachment path: + +GitHub supporting evidence (if any): +- repo_url: +- docs/readme/example path: +- supporting note: + +Conclusion: +- is the citation sufficient for the claim? +- does the claim need multiple citations? +- does the case still need manual review? +``` + +--- + +## 17. Stop conditions and upgrade conditions + +Remain in research / evidence collection mode rather than Word-editing mode when: + +- the target chapter is still unclear in the outline +- QMD hits have not been validated in Docling text or the original PDF +- the key claim lacks original-paper or high-quality review support +- the paper-to-GitHub relationship is still unclear +- the required PDF, supplement, or release asset has not yet been downloaded + +Allow escalation to `writer` only when: + +- the target chapter or paragraph is clear +- key evidence has been validated at least once +- citation candidates are clear at the paper level +- the distinction between fact, future direction, and author synthesis is clear + +--- + +## 18. Items that still need practical calibration + +These rules already exist, but still need real workflow runs for final tuning: + +1. whether QMD parameters should vary by chapter + - current defaults: `top 8`, threshold `0.45` +2. which Docling JSON fields should be fixed in the citation archive +3. how far GitHub evidence should go into source-level inspection +4. whether `paper_github_repo_map.csv` will eventually need module-level fields +5. whether Zotero automatic insertion should be tested on Word copies only +6. where the best boundary lies for paragraph-level edits across different chapters diff --git a/pending-calibration-tasks.md b/pending-calibration-tasks.md new file mode 100644 index 0000000..d12b9de --- /dev/null +++ b/pending-calibration-tasks.md @@ -0,0 +1,144 @@ +# Pending Calibration Tasks + +This file tracks the remaining “last-mile” items that should be calibrated through real workflow runs. + +These items do not block progress, but they should be refined over the next few iterations and then folded back into the main workflow template. + +--- + +## 1. How to practice the first batch of citation evidence files + +Start with a small pilot instead of a full rollout. + +Recommended pilot scope: + +- 1 chapter +- 2 to 3 key claims +- 1 to 2 core supporting papers per claim + +Recommended starting targets: + +- Chapter 1 claims about resistance mechanisms or ribosome-binding sites +- Chapter 5 entries such as `Macformer`, `PKS Enumerator`, or `SIME` + +Minimal pilot flow: + +1. `qmd_retriever` retrieves evidence from the `papers` collection. +2. Validate the passage in Docling markdown. +3. If needed, inspect Docling JSON for structured field or block evidence. +4. If the claim concerns implementation details, inspect GitHub `README`, `docs`, or `examples`. +5. `citation_checker` recommends papers and insertion positions. +6. `citation_archivist` writes a markdown evidence file under `citation-evidence/`. +7. A human reviews whether the evidence file is strong enough to show the citation is not fabricated. + +The purpose of the pilot is to answer: + +- Is the current citation evidence template sufficient? +- Which Docling JSON fields are actually useful? +- How much GitHub evidence is useful in this review workflow? + +--- + +## 2. Items that still need calibration + +### 2.1 QMD retrieval parameters + +Current defaults: + +- `top 8` +- threshold `0.45` +- default order: `papers -> github -> notes` + +Still to validate: + +- whether `0.45` is too high or too low for this Chinese review corpus +- whether `top 8` is enough +- whether some chapters need stricter or looser retrieval settings + +### 2.2 Docling JSON field whitelist + +Already decided: + +- keep both `markdown + json` + +Still to validate: + +- which JSON fields are most useful for citation evidence files +- whether page indices, block IDs, heading levels, or paragraph indices should always be preserved +- whether a dedicated JSON field extraction script is needed + +### 2.3 GitHub evidence acceptance rules + +Already decided: + +- do not force GitHub inspection if the paper itself is already clear +- if implementation details are unclear, check `README / docs / examples` first +- only inspect source code or key config files if those higher-level materials remain insufficient + +Still to validate: + +- when `README` alone is enough +- when `docs/examples` are required +- when source inspection is required to avoid overclaiming + +### 2.4 Future granularity of `paper_github_repo_map.csv` + +Already decided: + +- default granularity is one row per `paper <-> repository` mapping + +Still to validate: + +- whether module-level or subdirectory-level mapping fields are needed later +- or whether that detail should stay in the `notes` field only + +### 2.5 Zotero automatic insertion on Word copies + +Current default: + +- human final confirmation remains the default + +Still to validate: + +- whether automatic insertion experiments should be allowed on Word copies +- which sections or sentence types are safe candidates for that experiment + +### 2.6 Failure recovery flow for Word edits + +Already decided: + +- create a backup before editing +- restore from backup if the edit fails or produces clearly bad output + +Still to validate: + +- whether a dedicated recovery script is needed +- whether backup naming should be standardized with timestamps +- whether recovery should become a mandatory part of the `writer` receipt + +--- + +## 3. Recommended order for future convergence + +Avoid finalizing everything at once. A safer order is: + +1. run the first citation-evidence pilot +2. calibrate QMD thresholds and candidate counts +3. refine the Docling JSON field whitelist +4. refine how deeply GitHub evidence should go into source code +5. only then decide whether Zotero automatic insertion should be tested on draft copies + +--- + +## 4. Write-back policy + +After each real workflow round, write the stable result back into one of: + +- `multi-agent-review-workflow-template.md` +- `codex-multi-agent-setup-notes.md` +- this file + +Rule: + +- if the rule is stable, write it into the main template +- if it is still experimental, keep it here diff --git a/review-outline.md b/review-outline.md new file mode 100644 index 0000000..3d8cb0f --- /dev/null +++ b/review-outline.md @@ -0,0 +1,127 @@ +# AI-Driven Design of 16-Membered Macrolides: From Traditional Antibiotic Optimization to Intelligent Molecular Generation + +This file is the default primary outline for the current project. + +Each new task round should read this file first to determine the target chapter and subsection. + +If this file is missing or clearly outdated, an agent may extract a provisional outline from the Word draft and write: + +- `review-outline.generated.md` + +Once the generated outline is manually confirmed, it should be folded back into this file. + +--- + +## Abstract + +## Chapter 1. Introduction + +### 1.1 Development and challenges of antibiotics + +#### 1.1.1 Overview + +#### 1.1.2 Macrolide antibiotics + +#### 1.1.3 Current research status of macrolide antibiotics + +### 1.2 Drug resistance in macrolides + +#### 1.2.1 Resistance mechanisms of macrolide antibiotics + +#### 1.2.2 Binding sites of macrolide antibiotics + +### 1.3 Quantitative structure-activity relationships + +#### 1.3.1 Molecular representation + +#### 1.3.2 1D-QSAR + +#### 1.3.3 2D-QSAR + +#### 1.3.4 3D-QSAR + +### 1.4 Molecular energy minimization + +#### 1.4.1 Molecular mechanics + +#### 1.4.2 Quantum mechanics + +#### 1.4.3 Hybrid quantum mechanics / molecular mechanics (QM/MM) + +#### 1.4.4 Conformational ensemble sampling + +### 1.5 AI-based innovative antibiotic design + +## Chapter 2. Current status of macrocycle design methods + +### 2.1 Evolution of computer-aided macrocycle design + +#### 2.1.1 Early geometric matching and fragment stitching + +#### 2.1.2 Structured fragment-linking algorithms + +#### 2.1.3 Commercial and semi-automated tools + +#### 2.1.4 LigMac: end-to-end structure-guided design + +#### 2.1.5 Molecular-field-based fragment replacement tools + +#### 2.1.6 Web platforms with configurable linker libraries + +### 2.2 Generative deep learning for macrocycle design + +### 2.3 Reinforcement learning optimization for macrocycle design + +## Chapter 3. Specialized models and strategies for macrocycle generation + +### 3.1 Generative models based on fragment linking / cyclization + +### 3.2 Generative models for macrocyclic peptides and special scaffolds + +### 3.3 Generative models and tools for macrolides + +#### 3.3.1 PKS-based generation of macrolides + +## Chapter 4. AI-driven molecular generation techniques + +### 4.1 Sequence-based generative models (SMILES representation) + +### 4.2 Molecular graph-based generative models + +### 4.3 GAN and reinforcement learning methods + +### 4.4 Emerging diffusion models and 3D generation + +## Chapter 5. Generative models for macrocyclic molecules + +### 5.1 Generative models for macrocycles + +#### 5.1.1 Challenges in macrocycle design + +#### 5.1.2 Macformer: macrocycle structure generation + +#### 5.1.3 MacroHop: macrocycle scaffold generation + +#### 5.1.4 MacroEvoLution: evolutionary macrocycle design + +#### 5.1.5 HELM-GPT: macrocyclic peptide generation + +### 5.2 Specialized tools for macrolides + +#### 5.2.1 PKS Enumerator + +#### 5.2.2 SIME: biosynthesis-inspired macrocycle design + +#### 5.2.3 Biosynthesis-driven macrocycle design strategies + +### 5.3 Fixed-scaffold structure generation strategies + +#### 5.3.1 Scaffold-constrained generation + +#### 5.3.2 Site-directed generation + +#### 5.3.3 Fragment stitching and side-chain enumeration + +## Conclusion and outlook + +## References