first commit

This commit is contained in:
mm644706215
2026-03-10 14:51:29 +08:00
commit 40651842db
7 changed files with 1956 additions and 0 deletions

19
.gitignore vendored Normal file
View File

@@ -0,0 +1,19 @@
# Large local reference corpora: keep local by default.
paper/
文献/
root/
# Track the English working draft, ignore other docx files by default.
*.docx
!macrolide-review-draft.docx
# Office temporary files
~$*.docx
~$*.doc
*.tmp
*.bak
*.log
# OS files
.DS_Store
Thumbs.db

306
README.md Normal file
View File

@@ -0,0 +1,306 @@
# scholarly-writing-workbench
This directory is intended to be a reusable workbench for paper writing and literature review workflows.
The goal is not to store every raw reference asset in Git. The goal is to version:
- reusable prompt templates
- Codex multi-agent setup notes
- the working Word draft
- small workflow documents, checklists, and mapping tables
Large local reference corpora should stay outside normal Git tracking by default, including:
- `paper/`
- the localized reference folder
- `root/`
Those directories are already ignored in [`.gitignore`](./.gitignore). If you later decide to version part of them, prefer a small curated subset or Git LFS instead of pushing a full local corpus.
---
## 1. What this repository should contain
Recommended for version control:
- `README.md`
- `multi-agent-review-workflow-template.md`
- `codex-multi-agent-setup-notes.md`
- `review-outline.md`
- `pending-calibration-tasks.md`
- `macrolide-review-draft.docx`
- small task lists, mapping tables, and workflow notes
Not recommended for normal Git tracking:
- full PDF corpora
- full Zotero `storage/`
- large GitHub source snapshots
- large Docling parse outputs
- large QMD index databases
---
## 2. Required skills
This workflow depends on these `Deep-Research-skills` skills:
- `research`
- `research-add-fields`
- `research-add-items`
- `research-deep`
- `research-report`
Useful supporting skills:
- `skill-installer`
- `skill-creator`
On a fresh machine, confirm they are installed under:
- Windows:
- `C:\Users\<user>\.codex\skills`
- macOS:
- `/Users/<user>/.codex/skills`
- Linux:
- `/home/<user>/.codex/skills`
---
## 3. Required MCP servers
Core MCP servers:
- `word-mcp`
- edits `.docx`
- `zotero`
- manages items, PDFs, and citations
- `docling`
- parses PDFs
- `qmd`
- performs local hybrid retrieval
Strongly recommended:
- `chrome-devtools`
- opens paper pages and GitHub pages for manual downloading
---
## 4. MCP installation notes
### 4.1 `word-mcp`
Recommended approach: clone locally and run with `pixi`.
Example:
```powershell
codex mcp add word-mcp -- pixi run --manifest-path C:/path/to/word-mcp start
```
### 4.2 `zotero`
Recommended approach: clone locally, run with `pixi`, and configure:
- `ZOTERO_API_KEY`
- `ZOTERO_USER_ID`
- `UNPAYWALL_EMAIL`
Example:
```powershell
codex mcp add zotero `
--env ZOTERO_API_KEY=YOUR_KEY `
--env ZOTERO_USER_ID=YOUR_USER_ID `
--env UNPAYWALL_EMAIL=YOUR_EMAIL `
-- pixi run --manifest-path C:/path/to/mcp-zotero start
```
### 4.3 `docling`
Recommended approach: clone locally and run with `uv`.
Example:
```powershell
codex mcp add docling -- uv run --directory C:/path/to/docling-mcp docling-mcp-server --transport stdio
```
### 4.4 `qmd`
Recommended requirements:
- Node.js >= 22
- either a stable local build or a verified global installation
Example:
```powershell
codex mcp add qmd -- node C:/path/to/qmd/dist/qmd.js mcp
```
### 4.5 `chrome-devtools`
Windows example:
```powershell
codex mcp add chrome-devtools -- npx chrome-devtools-mcp@latest
```
On Windows 11, you may also need `SystemRoot`, `PROGRAMFILES`, and a larger `startup_timeout_ms` in `~/.codex/config.toml`.
macOS and Linux usually do not need the Windows-specific environment block.
---
## 5. Codex multi-agent requirements
At minimum, `~/.codex/config.toml` should contain:
```toml
[features]
multi_agent = true
```
Recommended additions:
```toml
[agents]
max_threads = 6
max_depth = 1
```
Recommended agents for this workflow:
- `deep_researcher`
- `zotero_locator`
- `qmd_retriever`
- `github_mapper`
- `writer`
- `citation_checker`
- `citation_archivist`
- `web_researcher`
See:
- [`codex-multi-agent-setup-notes.md`](./codex-multi-agent-setup-notes.md)
---
## 6. Recommended dynamic variables
To keep this workflow portable across Windows, macOS, and Linux, maintain key paths as variables instead of hard-coding them everywhere.
Recommended variables:
- `REVIEW_TITLE`
- `WORK_ROOT`
- `WORD_TARGET_DOC`
- `REVIEW_OUTLINE_FILE`
- `ANALYSIS_DIR`
- `ZOTERO_ROOT`
- `ZOTERO_STORAGE_DIR`
- `PAPER_GITHUB_MAP_CSV`
- `RAG_ROOT`
- `RAG_PARSED_MD_DIR`
- `GITHUB_SOURCE_DIR`
- `GITHUB_MD_DIR`
- `CITATION_EVIDENCE_DIR`
See:
- [`multi-agent-review-workflow-template.md`](./multi-agent-review-workflow-template.md)
---
## 7. Core workflow
Recommended execution order:
1. `Deep-Research-skills`
- identify missing papers, missing GitHub projects, and missing evidence
2. `zotero`
- check whether items and PDFs already exist
3. `chrome-devtools`
- open download pages so the user can manually download PDFs, supplements, or GitHub assets
4. `docling`
- convert PDFs into structured text
5. `qmd`
- perform hybrid retrieval over parsed papers, GitHub docs, and notes
6. `word-mcp`
- revise the Word draft only after evidence is ready
7. `zotero`
- support citation insertion and citation review
---
## 8. Paper-to-GitHub mapping
Maintain both:
- Zotero source traces
- a local structured mapping table
Recommended mapping table:
- `paper_github_repo_map.csv`
Rules:
- one paper to multiple repositories: one row per mapping
- one repository to multiple papers: one row per mapping
- QMD is the primary retrieval layer
- Zotero is the source-trace layer
---
## 9. Minimal checklist for a fresh machine
1. Install Codex CLI
2. Install and verify `Deep-Research-skills`
3. Install and verify:
- `word-mcp`
- `zotero`
- `docling`
- `qmd`
- `chrome-devtools`
4. Confirm `~/.codex/config.toml` enables:
- `multi_agent = true`
5. Confirm `~/.codex/agents/` contains the required agent configs
6. Confirm Zotero local data or Zotero Web API access is available
7. Confirm QMD collections, Docling output directories, and mapping files are configured
8. Confirm the Word draft and outline file paths are updated
---
## 10. Git initialization and push
Initialize locally:
```powershell
git init -b main
git add README.md .gitignore *.md *.docx
git commit -m "Initialize scholarly writing workbench"
```
After creating the remote repository in Gitea:
```powershell
git remote add origin <YOUR_GITEA_REPO_URL>
git push -u origin main
```
---
## 11. Recommended repository name
Preferred:
- `scholarly-writing-workbench`
Alternatives:
- `paper-review-agent-workbench`
- `scholarly-writing-agent-kit`
- `literature-review-agent-kit`

View File

@@ -0,0 +1,343 @@
# Codex Multi-Agent Setup Notes
This note records how to enable and configure Codex multi-agent support for the literature review workflow.
---
## 1. Current status
Codex multi-agent support is enabled through:
- `C:\Users\pylyz\.codex\config.toml`
Key setting:
```toml
[features]
multi_agent = true
```
The current environment also includes review-specific agent role definitions.
---
## 2. Defined agent roles
Agent config directory:
- `C:\Users\pylyz\.codex\agents`
Current roles:
- `deep_researcher`
- identifies literature gaps, candidate papers, and candidate GitHub projects
- `zotero_locator`
- locates Zotero items, DOIs, attachments, and PDF paths
- `qmd_retriever`
- retrieves evidence from parsed papers, GitHub docs, and notes
- `github_mapper`
- maintains paper-to-GitHub mappings
- `writer`
- edits the Word draft only after evidence is ready
- `citation_checker`
- checks claim-to-paper matching and citation placement
- `citation_archivist`
- archives auditable evidence files for planned citations
- `web_researcher`
- performs live web research
Recommended main-thread role:
- `supervisor`
- defines the task, delegates subtasks, collects evidence, and decides when to edit Word
---
## 3. Relevant configuration files
Global config:
- `C:\Users\pylyz\.codex\config.toml`
Agent configs:
- `C:\Users\pylyz\.codex\agents\deep-researcher.toml`
- `C:\Users\pylyz\.codex\agents\zotero-locator.toml`
- `C:\Users\pylyz\.codex\agents\qmd-retriever.toml`
- `C:\Users\pylyz\.codex\agents\github-mapper.toml`
- `C:\Users\pylyz\.codex\agents\writer.toml`
- `C:\Users\pylyz\.codex\agents\citation-checker.toml`
- `C:\Users\pylyz\.codex\agents\citation-archivist.toml`
- `C:\Users\pylyz\.codex\agents\web-researcher.toml`
Important local workflow files:
- working draft:
- `macrolide-review-draft.docx`
- paper-to-GitHub mapping table:
- `paper_github_repo_map.csv`
---
## 4. How to enable multi-agent
### Option A: edit the config file directly
Update:
- `C:\Users\pylyz\.codex\config.toml`
Make sure it includes:
```toml
[features]
multi_agent = true
[agents]
max_threads = 6
max_depth = 1
```
Then declare individual agents:
```toml
[agents.deep_researcher]
description = "..."
config_file = "agents/deep-researcher.toml"
```
Repeat for other roles.
### Option B: use the CLI experimental toggle if available
Some Codex builds expose a CLI experimental toggle for multi-agent mode. If present, you can enable it there and restart the session.
For this machine, the file-based configuration is already the main source of truth.
---
## 5. Cross-platform notes
This note was prepared on Windows 11. If you move the workflow to macOS or Linux, pay attention to the following differences.
### 5.1 Config paths
- Windows:
- `C:\Users\<user>\.codex\config.toml`
- macOS:
- `/Users/<user>/.codex/config.toml`
- Linux:
- `/home/<user>/.codex/config.toml`
### 5.2 Agent directory
- Windows:
- `C:\Users\<user>\.codex\agents`
- macOS:
- `/Users/<user>/.codex/agents`
- Linux:
- `/home/<user>/.codex/agents`
### 5.3 Path syntax
- Windows often uses backslashes, but forward slashes are usually safer in TOML and CLI arguments.
- macOS and Linux should use forward slashes.
- Do not copy Windows absolute paths directly into macOS or Linux configs.
### 5.4 Shell and command differences
- Windows commonly uses `powershell` or `cmd`
- macOS and Linux typically use `bash` or `zsh`
Do not copy Windows-only patterns like:
- `cmd /c`
- `set`
- `where`
Typical Unix-like alternatives:
- `bash -lc`
- `which`
- `export`
### 5.5 MCP startup command differences
Some MCP servers are cross-platform, but their launch commands differ.
Example: Windows `chrome-devtools` config:
```toml
[mcp_servers.chrome-devtools]
command = "cmd"
args = ["/c", "npx", "-y", "chrome-devtools-mcp@latest"]
env = { SystemRoot = "C:\\Windows", PROGRAMFILES = "C:\\Program Files" }
startup_timeout_ms = 20_000
```
Typical macOS/Linux variant:
```toml
[mcp_servers.chrome-devtools]
command = "npx"
args = ["-y", "chrome-devtools-mcp@latest"]
startup_timeout_ms = 20_000
```
### 5.6 Environment variables
Windows-only extras such as `SystemRoot` and `PROGRAMFILES` usually do not belong in macOS/Linux configs.
Cross-platform variables such as API keys and email values are usually portable.
### 5.7 Executable paths
Some Windows setups use explicit executable paths, such as a pinned `node.exe`.
On macOS/Linux, replace them with the correct local paths, or prefer portable commands when possible:
- `uv`
- `uvx`
- `pixi`
- `python`
- `node`
- `npx`
### 5.8 Windows-only sections
Do not copy Windows-only sections such as:
```toml
[windows]
sandbox = "unelevated"
```
### 5.9 Safe migration order
If you reuse this workflow on macOS or Linux:
1. copy the role structure
2. replace local paths
3. verify each MCP startup command
4. start a fresh Codex session and test
---
## 6. How changes take effect
1. Save `config.toml`
2. Save all relevant `agents/*.toml`
3. close the current Codex session
4. start a new Codex session
5. use the new session for delegation
Multi-agent configuration is typically read at session startup, so a fresh session is the safer default after config changes.
---
## 7. Recommended delegation pattern for this review
Recommended parallel group 1:
- `deep_researcher`
- `zotero_locator`
Recommended parallel group 2:
- `qmd_retriever`
- `github_mapper`
Recommended serial steps:
- `writer`
- `citation_checker`
- `citation_archivist`
Reasoning:
- search, retrieval, and mapping are naturally parallel
- Word editing should remain single-writer
- citation review and citation evidence archiving should happen after the content draft is stable
---
## 8. Good parallel tasks
- literature gap analysis
- Zotero existence checks
- DOI / author / year verification
- attachment path discovery
- QMD retrieval over papers
- QMD retrieval over GitHub docs and notes
- paper-to-GitHub mapping maintenance
- download list preparation
## 9. Tasks that should not be written in parallel
- multiple agents editing the same Word document
- multiple agents modifying the same mapping file without strict coordination
- drafting prose before evidence has converged
Rule of thumb:
- parallelize read-heavy work
- keep write-heavy work single-owner
---
## 10. Minimal example prompt
```text
Goal: strengthen Chapter 5 on macrolide-specific generation tools and scaffold-constrained optimization evidence.
Use a multi-agent workflow:
1. Run deep_researcher and zotero_locator in parallel.
- deep_researcher: identify missing papers, GitHub projects, and missing evidence
- zotero_locator: check whether those papers already exist in Zotero and whether PDFs are available
2. Then run qmd_retriever and github_mapper in parallel.
- qmd_retriever: retrieve direct evidence from parsed papers, GitHub docs, and notes
- github_mapper: update paper_github_repo_map.csv and check whether Zotero already preserves source traces
3. If evidence is sufficient, let writer propose or apply minimal Word edits.
4. After that, let citation_checker review citation placement.
5. Finally, let citation_archivist create citation evidence files for key claims.
```
---
## 11. Related tools
- `word-mcp`
- Word `.docx` editing
- `zotero`
- item metadata, PDFs, and citations
- `docling`
- PDF parsing
- `qmd`
- hybrid local retrieval
- `chrome-devtools`
- browser opening for manual downloads
- `Deep-Research-skills`
- structured research workflows
---
## 12. Maintenance habits
1. back up `config.toml` before major changes
2. update `paper_github_repo_map.csv` before or alongside GitHub-related evidence work
3. back up the Word draft before important edits
4. use multi-agent mode for larger tasks, then converge on a single `writer`
5. archive evidence for important citations early instead of waiting until the end
---
## 13. Possible future expansions
- add a dedicated `mechanism_checker` for ribosome-binding and resistance corrections
- add a dedicated `download_planner`
- add a dedicated `reporter` for round summaries

BIN
macrolide-review-draft.docx Normal file

Binary file not shown.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,144 @@
# Pending Calibration Tasks
This file tracks the remaining “last-mile” items that should be calibrated through real workflow runs.
These items do not block progress, but they should be refined over the next few iterations and then folded back into the main workflow template.
---
## 1. How to practice the first batch of citation evidence files
Start with a small pilot instead of a full rollout.
Recommended pilot scope:
- 1 chapter
- 2 to 3 key claims
- 1 to 2 core supporting papers per claim
Recommended starting targets:
- Chapter 1 claims about resistance mechanisms or ribosome-binding sites
- Chapter 5 entries such as `Macformer`, `PKS Enumerator`, or `SIME`
Minimal pilot flow:
1. `qmd_retriever` retrieves evidence from the `papers` collection.
2. Validate the passage in Docling markdown.
3. If needed, inspect Docling JSON for structured field or block evidence.
4. If the claim concerns implementation details, inspect GitHub `README`, `docs`, or `examples`.
5. `citation_checker` recommends papers and insertion positions.
6. `citation_archivist` writes a markdown evidence file under `citation-evidence/`.
7. A human reviews whether the evidence file is strong enough to show the citation is not fabricated.
The purpose of the pilot is to answer:
- Is the current citation evidence template sufficient?
- Which Docling JSON fields are actually useful?
- How much GitHub evidence is useful in this review workflow?
---
## 2. Items that still need calibration
### 2.1 QMD retrieval parameters
Current defaults:
- `top 8`
- threshold `0.45`
- default order: `papers -> github -> notes`
Still to validate:
- whether `0.45` is too high or too low for this Chinese review corpus
- whether `top 8` is enough
- whether some chapters need stricter or looser retrieval settings
### 2.2 Docling JSON field whitelist
Already decided:
- keep both `markdown + json`
Still to validate:
- which JSON fields are most useful for citation evidence files
- whether page indices, block IDs, heading levels, or paragraph indices should always be preserved
- whether a dedicated JSON field extraction script is needed
### 2.3 GitHub evidence acceptance rules
Already decided:
- do not force GitHub inspection if the paper itself is already clear
- if implementation details are unclear, check `README / docs / examples` first
- only inspect source code or key config files if those higher-level materials remain insufficient
Still to validate:
- when `README` alone is enough
- when `docs/examples` are required
- when source inspection is required to avoid overclaiming
### 2.4 Future granularity of `paper_github_repo_map.csv`
Already decided:
- default granularity is one row per `paper <-> repository` mapping
Still to validate:
- whether module-level or subdirectory-level mapping fields are needed later
- or whether that detail should stay in the `notes` field only
### 2.5 Zotero automatic insertion on Word copies
Current default:
- human final confirmation remains the default
Still to validate:
- whether automatic insertion experiments should be allowed on Word copies
- which sections or sentence types are safe candidates for that experiment
### 2.6 Failure recovery flow for Word edits
Already decided:
- create a backup before editing
- restore from backup if the edit fails or produces clearly bad output
Still to validate:
- whether a dedicated recovery script is needed
- whether backup naming should be standardized with timestamps
- whether recovery should become a mandatory part of the `writer` receipt
---
## 3. Recommended order for future convergence
Avoid finalizing everything at once. A safer order is:
1. run the first citation-evidence pilot
2. calibrate QMD thresholds and candidate counts
3. refine the Docling JSON field whitelist
4. refine how deeply GitHub evidence should go into source code
5. only then decide whether Zotero automatic insertion should be tested on draft copies
---
## 4. Write-back policy
After each real workflow round, write the stable result back into one of:
- `multi-agent-review-workflow-template.md`
- `codex-multi-agent-setup-notes.md`
- this file
Rule:
- if the rule is stable, write it into the main template
- if it is still experimental, keep it here

127
review-outline.md Normal file
View File

@@ -0,0 +1,127 @@
# AI-Driven Design of 16-Membered Macrolides: From Traditional Antibiotic Optimization to Intelligent Molecular Generation
This file is the default primary outline for the current project.
Each new task round should read this file first to determine the target chapter and subsection.
If this file is missing or clearly outdated, an agent may extract a provisional outline from the Word draft and write:
- `review-outline.generated.md`
Once the generated outline is manually confirmed, it should be folded back into this file.
---
## Abstract
## Chapter 1. Introduction
### 1.1 Development and challenges of antibiotics
#### 1.1.1 Overview
#### 1.1.2 Macrolide antibiotics
#### 1.1.3 Current research status of macrolide antibiotics
### 1.2 Drug resistance in macrolides
#### 1.2.1 Resistance mechanisms of macrolide antibiotics
#### 1.2.2 Binding sites of macrolide antibiotics
### 1.3 Quantitative structure-activity relationships
#### 1.3.1 Molecular representation
#### 1.3.2 1D-QSAR
#### 1.3.3 2D-QSAR
#### 1.3.4 3D-QSAR
### 1.4 Molecular energy minimization
#### 1.4.1 Molecular mechanics
#### 1.4.2 Quantum mechanics
#### 1.4.3 Hybrid quantum mechanics / molecular mechanics (QM/MM)
#### 1.4.4 Conformational ensemble sampling
### 1.5 AI-based innovative antibiotic design
## Chapter 2. Current status of macrocycle design methods
### 2.1 Evolution of computer-aided macrocycle design
#### 2.1.1 Early geometric matching and fragment stitching
#### 2.1.2 Structured fragment-linking algorithms
#### 2.1.3 Commercial and semi-automated tools
#### 2.1.4 LigMac: end-to-end structure-guided design
#### 2.1.5 Molecular-field-based fragment replacement tools
#### 2.1.6 Web platforms with configurable linker libraries
### 2.2 Generative deep learning for macrocycle design
### 2.3 Reinforcement learning optimization for macrocycle design
## Chapter 3. Specialized models and strategies for macrocycle generation
### 3.1 Generative models based on fragment linking / cyclization
### 3.2 Generative models for macrocyclic peptides and special scaffolds
### 3.3 Generative models and tools for macrolides
#### 3.3.1 PKS-based generation of macrolides
## Chapter 4. AI-driven molecular generation techniques
### 4.1 Sequence-based generative models (SMILES representation)
### 4.2 Molecular graph-based generative models
### 4.3 GAN and reinforcement learning methods
### 4.4 Emerging diffusion models and 3D generation
## Chapter 5. Generative models for macrocyclic molecules
### 5.1 Generative models for macrocycles
#### 5.1.1 Challenges in macrocycle design
#### 5.1.2 Macformer: macrocycle structure generation
#### 5.1.3 MacroHop: macrocycle scaffold generation
#### 5.1.4 MacroEvoLution: evolutionary macrocycle design
#### 5.1.5 HELM-GPT: macrocyclic peptide generation
### 5.2 Specialized tools for macrolides
#### 5.2.1 PKS Enumerator
#### 5.2.2 SIME: biosynthesis-inspired macrocycle design
#### 5.2.3 Biosynthesis-driven macrocycle design strategies
### 5.3 Fixed-scaffold structure generation strategies
#### 5.3.1 Scaffold-constrained generation
#### 5.3.2 Site-directed generation
#### 5.3.3 Fragment stitching and side-chain enumeration
## Conclusion and outlook
## References