chore: bootstrap reusable quantization template workspace

2026-03-02 23:07:48 +08:00
commit 1c5822d16b
15 changed files with 167197 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,30 @@
+# Python / env
+.venv/
+__pycache__/
+*.pyc
+*.pyo
+*.pyd
+
+# Local scratch
+.trash/
+*.log
+
+# Model weights and large artifacts
+*.gguf
+*.safetensors
+*.safetensors.index.json
+*.bin
+*.pt
+*.pth
+*.ckpt
+*.onnx
+
+# Hugging Face / cache-like folders
+.cache/
+
+# Keep publish metadata, ignore heavy files in publish folder
+modelscope_upload/*.gguf
+
+# OS
+.DS_Store
+Thumbs.db
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -0,0 +1,27 @@
+# AGENTS Guidelines
+
+本文件用于指导后续自动化代理在本仓库中的行为。
+
+## 目标
+
+维护可复用的量化与发布流程，不把大模型权重纳入 Git 版本库。
+
+## 目录约定
+
+- 文档放在 `docs/`
+- 脚本放在 `scripts/`
+- 校准数据放在 `calibration/`
+- 发布目录使用 `modelscope_upload/`
+
+## 必须遵守
+
+1. 不提交任何权重文件（`.gguf`、`.safetensors`、`.bin`、`.pt` 等）
+2. 不提交密钥、token、凭据
+3. 变更流程时必须同步更新 `docs/` 文档
+4. 优先复用现有脚本，不重复造轮子
+
+## 执行习惯
+
+- 对长耗时任务（转换/量化/上传）先给出检查命令和预估耗时
+- 上传命令默认提供两版：直连（无代理）与代理版
+- 任何会移动大文件的操作先检查磁盘空间
--- a/README.md
+++ b/README.md
@@ -0,0 +1,37 @@
+# Qwen3.5-27B Quantization Workspace
+
+这个仓库用于沉淀 Qwen3.5-27B 系列模型的可复用量化流程与发布脚本，重点保存：
+
+- 量化流程文档
+- 校准数据与数据构建脚本
+- ModelScope 发布模板文件与上传脚本
+
+不在仓库中托管权重文件（`.gguf` 等大文件已在 `.gitignore` 中忽略）。
+
+## 目录结构
+
+- `docs/`
+  - `QWEN35_QUANTIZATION_MANUAL.md`
+  - `MODELSCOPE_UPLOAD_SOP.md`
+- `scripts/`
+  - `prepare_calib_data.py`
+  - `upload_to_modelscope.sh`
+- `calibration/`
+  - `calibration_data_v5_rc.txt`
+  - `calibration_data_v5_rc_code.txt`
+  - `sources/`
+- `modelscope_upload/`
+  - 面向 ModelScope 的发布目录（README/configuration/.gitattributes 与产物）
+
+## 典型工作流
+
+1. 准备/更新校准数据（`scripts/prepare_calib_data.py`）
+2. 使用 Docker 进行 imatrix 与量化（见 `docs/QWEN35_QUANTIZATION_MANUAL.md`）
+3. 组织发布目录（`modelscope_upload/`）
+4. 手动执行上传（见 `docs/MODELSCOPE_UPLOAD_SOP.md` 或 `scripts/upload_to_modelscope.sh`）
+
+## Git 建议
+
+- 只提交脚本、文档、配置和小体积数据
+- 不提交 token、权重、环境目录
+- 每次流程调整同步更新 `docs/` 与 `AGENTS.md`
--- a/calibration/calibration_data_v5_rc.txt
+++ b/calibration/calibration_data_v5_rc.txt
--- a/calibration/calibration_data_v5_rc_code.txt
+++ b/calibration/calibration_data_v5_rc_code.txt
--- a/calibration/sources/code74k_2000.txt
+++ b/calibration/sources/code74k_2000.txt
--- a/calibration/sources/openhermes_coding_chosen_1000.txt
+++ b/calibration/sources/openhermes_coding_chosen_1000.txt
--- a/docs/MODELSCOPE_UPLOAD_SOP.md
+++ b/docs/MODELSCOPE_UPLOAD_SOP.md
@@ -0,0 +1,94 @@
+# ModelScope 上传 SOP（当前项目）
+
+## 1. 目录与文件
+
+工作目录：
+
+`/home/zly/project/modelscope_qwen35_27b_quantized`
+
+上传目录：
+
+`/home/zly/project/modelscope_qwen35_27b_quantized/modelscope_upload`
+
+上传目录应包含：
+
+- `README.md`（超过 200 字，含 `tasks` 和 `license`）
+- `configuration.json`
+- `.gitattributes`
+- `Qwen3.5-27B-IQ4_KS.gguf`
+- `Qwen3.5-27B-IQ5_K.gguf`
+- `Qwen3.5-27B-IQ6_K.gguf`
+- `Qwen3.5-27B.imatrix.dat`
+
+快速检查：
+
+```bash
+cd /home/zly/project/modelscope_qwen35_27b_quantized
+ls -lah modelscope_upload
+wc -m modelscope_upload/README.md
+```
+
+## 2. 环境准备
+
+使用本地虚拟环境：
+
+```bash
+cd /home/zly/project/modelscope_qwen35_27b_quantized
+./.venv/bin/python -V
+./.venv/bin/modelscope --version
+```
+
+如果未安装：
+
+```bash
+./.venv/bin/pip install -U modelscope "setuptools<81"
+```
+
+## 3. 登录 ModelScope
+
+```bash
+cd /home/zly/project/modelscope_qwen35_27b_quantized
+./.venv/bin/modelscope login --token "<YOUR_MODELSCOPE_TOKEN>"
+```
+
+## 4. 上传（推荐：直连无代理）
+
+```bash
+cd /home/zly/project/modelscope_qwen35_27b_quantized
+env -u HTTP_PROXY -u HTTPS_PROXY -u ALL_PROXY -u NO_PROXY \
+  ./.venv/bin/modelscope upload \
+  "jiaoyuan/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF" \
+  "./modelscope_upload" \
+  . \
+  --repo-type model \
+  --commit-message "Upload Qwen3.5-27B quantized GGUF weights"
+```
+
+## 5. 上传（如需走代理）
+
+```bash
+cd /home/zly/project/modelscope_qwen35_27b_quantized
+./.venv/bin/modelscope upload \
+  "jiaoyuan/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF" \
+  "./modelscope_upload" \
+  . \
+  --repo-type model \
+  --commit-message "Upload Qwen3.5-27B quantized GGUF weights"
+```
+
+## 6. 断点/重传说明
+
+- 上传中断后可直接重复执行第 4 步或第 5 步命令。
+- CLI 会先做 hash 校验并复用已上传分片，不需要手工删除本地文件。
+
+## 7. 发布后检查
+
+仓库地址：
+
+`https://www.modelscope.cn/models/jiaoyuan/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF`
+
+检查点：
+
+- 文件是否完整显示（3 个 GGUF + 1 个 imatrix + 元数据）
+- README 是否正确展示任务和许可
+- 页面是否脱离预发布状态（若仍预发布，可补充说明后再申诉）
--- a/docs/QWEN35_QUANTIZATION_MANUAL.md
+++ b/docs/QWEN35_QUANTIZATION_MANUAL.md
@@ -0,0 +1,231 @@
+# Qwen3.5-27B 量化操作手册（ik_llama.cpp Docker 版）
+
+## 1. 目标与范围
+
+本手册用于在目录 `/home/zly/project/modelscope_qwen35_27b_quantized` 中，使用 `ik_llama.cpp` 对 Qwen3.5-27B BF16 GGUF 进行 imatrix 计算与量化，产出：
+
+- `Qwen3.5-27B.imatrix.dat`
+- `Qwen3.5-27B-IQ4_KS.gguf`
+- `Qwen3.5-27B-IQ5_K.gguf`
+- `Qwen3.5-27B-IQ6_K.gguf`
+
+镜像：`hotwa/ik:latest`  
+核心工具：`/llama-imatrix`、`/llama-quantize`
+
+---
+
+## 2. 前置条件
+
+- Docker 可用并有权限访问 daemon
+- NVIDIA GPU 可用（推荐）
+- 当前目录存在 BF16 输入文件：
+  - `Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf`
+- 当前目录存在 Python 环境和脚本：
+  - `./.venv/bin/python`
+  - `prepare_calib_data.py`
+
+检查命令：
+
+```bash
+cd /home/zly/project/modelscope_qwen35_27b_quantized
+
+docker run --rm --gpus all --entrypoint sh hotwa/ik:latest -c "ls -la /llama-imatrix /llama-quantize"
+ls -lh Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf
+```
+
+---
+
+## 3. 准备校准数据
+
+### 3.1 下载基础校准文件（1152 blocks 来源）
+
+推荐（社区常用版本）：
+
+```bash
+cd /home/zly/project/modelscope_qwen35_27b_quantized
+wget -O calibration_data_v5_rc.txt \
+  "https://gist.githubusercontent.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c/raw/571fda718462de863e5a0171078c175420c7649a/calibration_data_v5_rc.txt"
+```
+
+官方备用源（网络可达时）：
+
+```bash
+wget -O calibration_data_v5_rc.txt \
+  "https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/calibration/calibration_data.txt"
+```
+
+### 3.2 生成混合校准集
+
+脚本目标组成（严格）：
+
+- 基础数据：1152 blocks（`calibration_data_v5_rc.txt`）
+- 代码对话：2000 blocks（`QuixiAI/Code-74k-ShareGPT-Vicuna`）
+- 代码偏好：1000 blocks（`alvarobartt/openhermes-preferences-coding`）
+
+执行：
+
+```bash
+cd /home/zly/project/modelscope_qwen35_27b_quantized
+./.venv/bin/python prepare_calib_data.py --force-refresh
+```
+
+### 3.3 校验 block 数
+
+```bash
+cd /home/zly/project/modelscope_qwen35_27b_quantized
+
+./.venv/bin/python - <<'PY'
+import re
+from pathlib import Path
+
+def count_blocks(path):
+    txt = Path(path).read_text(encoding="utf-8", errors="ignore")
+    return len([b for b in re.split(r"\n\s*\n", txt) if b.strip()])
+
+print("base =", count_blocks("calibration_data_v5_rc.txt"))
+print("mix  =", count_blocks("calibration_data_v5_rc_code.txt"))
+PY
+```
+
+期望：
+
+- `base = 1152`
+- `mix = 4152`（1152 + 2000 + 1000）
+
+---
+
+## 4. 生成 imatrix
+
+```bash
+cd /home/zly/project/modelscope_qwen35_27b_quantized
+
+docker run --gpus all --rm \
+  --entrypoint sh \
+  -v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
+  -v /home/zly/project/modelscope_qwen35_27b_quantized/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \
+  hotwa/ik:latest \
+  -c "/llama-imatrix \
+    -m /workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
+    -f /workspace/calib_data.txt \
+    -o /workspace/models/Qwen3.5-27B.imatrix.dat \
+    --ctx-size 512 \
+    -ngl 99 \
+    --threads 16"
+```
+
+完成校验：
+
+```bash
+ls -lh Qwen3.5-27B.imatrix.dat
+```
+
+---
+
+## 5. 量化三种格式
+
+### 5.1 IQ4_KS
+
+```bash
+cd /home/zly/project/modelscope_qwen35_27b_quantized
+
+docker run --gpus all --rm \
+  --entrypoint sh \
+  -v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
+  hotwa/ik:latest \
+  -c "/llama-quantize \
+    --imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
+    /workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
+    /workspace/models/Qwen3.5-27B-IQ4_KS.gguf \
+    IQ4_KS"
+```
+
+### 5.2 IQ5_K
+
+```bash
+cd /home/zly/project/modelscope_qwen35_27b_quantized
+
+docker run --gpus all --rm \
+  --entrypoint sh \
+  -v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
+  hotwa/ik:latest \
+  -c "/llama-quantize \
+    --imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
+    /workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
+    /workspace/models/Qwen3.5-27B-IQ5_K.gguf \
+    IQ5_K"
+```
+
+### 5.3 IQ6_K
+
+```bash
+cd /home/zly/project/modelscope_qwen35_27b_quantized
+
+docker run --gpus all --rm \
+  --entrypoint sh \
+  -v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
+  hotwa/ik:latest \
+  -c "/llama-quantize \
+    --imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
+    /workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
+    /workspace/models/Qwen3.5-27B-IQ6_K.gguf \
+    IQ6_K"
+```
+
+---
+
+## 6. 一次性校验结果
+
+```bash
+cd /home/zly/project/modelscope_qwen35_27b_quantized
+ls -lh Qwen3.5-27B.imatrix.dat Qwen3.5-27B-IQ4_KS.gguf Qwen3.5-27B-IQ5_K.gguf Qwen3.5-27B-IQ6_K.gguf
+```
+
+本次实测（2026-03-02）：
+
+- `Qwen3.5-27B.imatrix.dat` = `13,582,647` bytes（约 `12.95 MB`）
+- `Qwen3.5-27B-IQ4_KS.gguf` = `14,705,833,248` bytes（约 `13.70 GB`）
+- `Qwen3.5-27B-IQ5_K.gguf` = `18,679,612,704` bytes（约 `17.40 GB`）
+- `Qwen3.5-27B-IQ6_K.gguf` = `22,292,632,864` bytes（约 `20.76 GB`）
+
+---
+
+## 7. 常见问题
+
+### 7.1 `docker.sock` 权限错误
+
+现象：`permission denied while trying to connect to the Docker daemon socket`
+
+处理：
+
+- 使用具备 Docker 权限的用户执行
+- 或检查 `docker` 用户组配置
+
+### 7.2 下载源 DNS 失败
+
+现象：`unable to resolve host address`
+
+处理：
+
+- 优先使用 gist 源（见 3.1）
+- 或配置可用代理后重试
+
+### 7.3 输出文件属主为 `root`
+
+容器写文件可能生成 root 属主。按需修正：
+
+```bash
+cd /home/zly/project/modelscope_qwen35_27b_quantized
+sudo chown -R $(id -u):$(id -g) Qwen3.5-27B*.gguf Qwen3.5-27B.imatrix.dat
+```
+
+---
+
+## 8. ModelScope 发布最小清单（可选）
+
+- `README.md`（>= 200 字，包含任务和许可信息）
+- `configuration.json`（包含 `framework`、`task`、`model.type`）
+- `.gitattributes`（`*.gguf` 走 LFS）
+- 量化文件：
+  - `Qwen3.5-27B-IQ4_KS.gguf`
+  - `Qwen3.5-27B-IQ5_K.gguf`
+  - `Qwen3.5-27B-IQ6_K.gguf`
--- a/modelscope_upload/.gitattributes
+++ b/modelscope_upload/.gitattributes
@@ -0,0 +1,5 @@
+*.gguf filter=lfs diff=lfs merge=lfs -text
+*.dat filter=lfs diff=lfs merge=lfs -text
+*.md text eol=lf
+*.json text eol=lf
+.gitattributes text eol=lf
--- a/modelscope_upload/Qwen3.5-27B.imatrix.dat
+++ b/modelscope_upload/Qwen3.5-27B.imatrix.dat
--- a/modelscope_upload/README.md
+++ b/modelscope_upload/README.md
@@ -0,0 +1,76 @@
+---
+tags:
+- text-generation
+- qwen
+- qwen35
+- gguf
+- quantization
+tasks:
+- text-generation
+license: Apache License 2.0
+---
+
+# Qwen3.5-27B Quantized GGUF (IQ4_KS / IQ5_K / IQ6_K)
+
+## 模型说明
+
+该仓库提供 Qwen3.5-27B 的 GGUF 量化版本，适配 llama.cpp 生态，包含 IQ4_KS、IQ5_K、IQ6_K 三种规格。权重由 BF16 GGUF 输入文件通过 imatrix 方式量化，重点平衡了体积、推理速度与精度表现，适用于不同显存预算下的文本生成任务。
+
+## 权重来源
+
+- 原始 BF16 GGUF 来源：`TeichAI/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF`
+- 本仓库内容为基于该来源进行 imatrix + GGUF 量化后的发布版本（IQ4_KS / IQ5_K / IQ6_K）
+
+## 量化方法
+
+本仓库采用 `ik_llama.cpp` Docker 镜像（`hotwa/ik:latest`）进行两阶段量化：
+
+1. 先用 `llama-imatrix` 基于校准语料计算 importance matrix（`Qwen3.5-27B.imatrix.dat`）
+2. 再用 `llama-quantize --imatrix ...` 分别导出 `IQ4_KS`、`IQ5_K`、`IQ6_K`
+
+核心量化参数：
+
+- imatrix 输入模型：`Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf`
+- `--ctx-size 512`
+- `-ngl 99`
+- `--threads 16`
+
+该流程使用 imatrix 对不同权重的重要性进行建模，可在同等量化位宽下减少关键层信息损失，提升量化后推理稳定性。
+
+## 校准数据来源与选择依据
+
+量化校准文件为 `calibration_data_v5_rc_code.txt`，总计 `4152` blocks，构成如下：
+
+- `1152` blocks：基础校准数据 `calibration_data_v5_rc.txt`
+- `2000` blocks：`QuixiAI/Code-74k-ShareGPT-Vicuna`
+- `1000` blocks：`alvarobartt/openhermes-preferences-coding`（`chosen` 分支）
+
+基础校准数据下载源：
+
+- 社区常用版本：`https://gist.githubusercontent.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c/raw/571fda718462de863e5a0171078c175420c7649a/calibration_data_v5_rc.txt`
+- 官方备用源：`https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/calibration/calibration_data.txt`
+
+选择这三部分数据的目的：
+
+- 基础数据用于覆盖通用语义与常见文本分布，避免模型只对代码域过拟合
+- Code-74k 对话样本提升代码生成、调试、解释等场景的量化保真度
+- OpenHermes coding preference 样本提供“更优回答偏好”信号，帮助保持代码输出的结构化与可读性
+
+该组合在“通用文本 + 代码任务”之间做了平衡，适合 Qwen3.5-27B Distill 模型的实际使用场景。
+
+## 文件内容
+
+- `Qwen3.5-27B-IQ4_KS.gguf`：低显存优先
+- `Qwen3.5-27B-IQ5_K.gguf`：性能和质量平衡
+- `Qwen3.5-27B-IQ6_K.gguf`：更高保真优先
+- `Qwen3.5-27B.imatrix.dat`：量化使用的 importance matrix
+
+## 使用建议
+
+- 设备资源紧张时优先 IQ4_KS
+- 通用推理场景优先 IQ5_K
+- 对质量要求更高时使用 IQ6_K
+
+## 备注
+
+该仓库用于发布可直接推理的 GGUF 权重，不包含训练过程文件。推理时请使用支持 GGUF 的推理框架（如 llama.cpp 相关实现）。
--- a/modelscope_upload/configuration.json
+++ b/modelscope_upload/configuration.json
@@ -0,0 +1,10 @@
+{
+  "framework": "ggml",
+  "task": "text-generation",
+  "model": {
+    "type": "qwen35"
+  },
+  "pipeline": {
+    "type": "text-generation"
+  }
+}
--- a/scripts/prepare_calib_data.py
+++ b/scripts/prepare_calib_data.py
@@ -0,0 +1,187 @@
+#!/usr/bin/env python3
+"""
+Prepare calibration_data_v5_rc_code.txt with exact composition:
+- base: 1152 blocks from calibration_data_v5_rc.txt
+- code: 2000 blocks from QuixiAI/Code-74k-ShareGPT-Vicuna
+- pref: 1000 blocks from alvarobartt/openhermes-preferences-coding
+"""
+
+from __future__ import annotations
+
+import argparse
+import random
+import re
+import subprocess
+import sys
+from pathlib import Path
+
+from datasets import load_dataset
+
+BASE_URL = (
+    "https://raw.githubusercontent.com/ggerganov/llama.cpp/master/"
+    "examples/calibration/calibration_data.txt"
+)
+BLOCK_SPLIT_RE = re.compile(r"\n\s*\n")
+
+
+def split_blocks(text: str) -> list[str]:
+    blocks = [b.strip() for b in BLOCK_SPLIT_RE.split(text) if b.strip()]
+    return blocks
+
+
+def read_blocks(path: Path) -> list[str]:
+    return split_blocks(path.read_text(encoding="utf-8", errors="ignore"))
+
+
+def write_blocks(path: Path, blocks: list[str]) -> None:
+    path.write_text("\n\n".join(blocks).strip() + "\n", encoding="utf-8")
+
+
+def ensure_base_file(path: Path) -> None:
+    if path.exists():
+        return
+    cmd = ["wget", BASE_URL, "-O", str(path)]
+    print("Downloading base calibration file:")
+    print(" ", " ".join(cmd))
+    subprocess.run(cmd, check=True)
+
+
+def pick_blocks(blocks: list[str], target: int, seed: int) -> list[str]:
+    if len(blocks) < target:
+        raise ValueError(f"Need {target} blocks but only got {len(blocks)}.")
+    rng = random.Random(seed)
+    idxs = list(range(len(blocks)))
+    rng.shuffle(idxs)
+    return [blocks[i] for i in idxs[:target]]
+
+
+def build_code74k_blocks(target: int, seed: int) -> list[str]:
+    ds = load_dataset("QuixiAI/Code-74k-ShareGPT-Vicuna", split="train")
+    rows = list(range(len(ds)))
+    rng = random.Random(seed)
+    rng.shuffle(rows)
+
+    out: list[str] = []
+    for i in rows:
+        conv = ds[i].get("conversations") or []
+        parts = []
+        for msg in conv:
+            value = (msg.get("value") or "").strip()
+            if value:
+                parts.append(value)
+        if parts:
+            out.append("\n".join(parts))
+        if len(out) >= target:
+            break
+
+    if len(out) < target:
+        raise RuntimeError(
+            f"Code-74k yielded only {len(out)} valid blocks, target is {target}."
+        )
+    return out
+
+
+def build_openhermes_blocks(target: int, seed: int) -> list[str]:
+    ds = load_dataset("alvarobartt/openhermes-preferences-coding", split="train")
+    rows = list(range(len(ds)))
+    rng = random.Random(seed + 1)
+    rng.shuffle(rows)
+
+    out: list[str] = []
+    for i in rows:
+        chosen = ds[i].get("chosen") or []
+        parts = []
+        for msg in chosen:
+            value = (msg.get("content") or "").strip()
+            if value:
+                parts.append(value)
+        if parts:
+            out.append("\n".join(parts))
+        if len(out) >= target:
+            break
+
+    if len(out) < target:
+        raise RuntimeError(
+            f"OpenHermes yielded only {len(out)} valid blocks, target is {target}."
+        )
+    return out
+
+
+def ensure_cached_blocks(
+    cache_path: Path,
+    target: int,
+    build_fn,
+    seed: int,
+) -> list[str]:
+    if cache_path.exists():
+        cached = read_blocks(cache_path)
+        if len(cached) >= target:
+            return cached[:target]
+        print(
+            f"{cache_path} has {len(cached)} blocks (< {target}), rebuilding from source."
+        )
+
+    blocks = build_fn(target, seed)
+    cache_path.parent.mkdir(parents=True, exist_ok=True)
+    write_blocks(cache_path, blocks)
+    return blocks
+
+
+def main() -> int:
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--seed", type=int, default=42)
+    parser.add_argument("--base-file", default="calibration_data_v5_rc.txt")
+    parser.add_argument("--output", default="calibration_data_v5_rc_code.txt")
+    parser.add_argument("--data-dir", default="data")
+    parser.add_argument("--force-refresh", action="store_true")
+    args = parser.parse_args()
+
+    base_file = Path(args.base_file)
+    output_file = Path(args.output)
+    data_dir = Path(args.data_dir)
+    code_cache = data_dir / "code74k_2000.txt"
+    openhermes_cache = data_dir / "openhermes_coding_chosen_1000.txt"
+
+    if args.force_refresh:
+        for p in [code_cache, openhermes_cache]:
+            if p.exists():
+                p.unlink()
+
+    ensure_base_file(base_file)
+    base_blocks_all = read_blocks(base_file)
+    base_blocks = pick_blocks(base_blocks_all, target=1152, seed=args.seed)
+
+    code_blocks = ensure_cached_blocks(
+        cache_path=code_cache,
+        target=2000,
+        build_fn=build_code74k_blocks,
+        seed=args.seed,
+    )
+    openhermes_blocks = ensure_cached_blocks(
+        cache_path=openhermes_cache,
+        target=1000,
+        build_fn=build_openhermes_blocks,
+        seed=args.seed,
+    )
+
+    merged = base_blocks + code_blocks + openhermes_blocks
+    write_blocks(output_file, merged)
+
+    print("Done.")
+    print(f"base blocks:       {len(base_blocks)}  ({base_file})")
+    print(f"code blocks:       {len(code_blocks)}  (QuixiAI/Code-74k-ShareGPT-Vicuna)")
+    print(
+        "openhermes blocks: "
+        f"{len(openhermes_blocks)}  (alvarobartt/openhermes-preferences-coding)"
+    )
+    print(f"total blocks:      {len(merged)}")
+    print(f"output:            {output_file}")
+    return 0
+
+
+if __name__ == "__main__":
+    try:
+        raise SystemExit(main())
+    except subprocess.CalledProcessError as exc:
+        print(f"Command failed with exit code {exc.returncode}", file=sys.stderr)
+        raise
--- a/scripts/upload_to_modelscope.sh
+++ b/scripts/upload_to_modelscope.sh
@@ -0,0 +1,25 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+# 用法：
+#   ./upload_to_modelscope.sh <repo_id> <token>
+# 示例：
+#   ./upload_to_modelscope.sh your_username/your_repo_name ms-xxxxxxxx
+
+REPO_ID="${1:-}"
+TOKEN="${2:-}"
+
+if [[ -z "${REPO_ID}" || -z "${TOKEN}" ]]; then
+  echo "Usage: $0 <repo_id> <token>"
+  exit 1
+fi
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+ROOT_DIR="$(cd "${SCRIPT_DIR}/.." && pwd)"
+
+"${ROOT_DIR}/.venv/bin/modelscope" login --token "${TOKEN}"
+"${ROOT_DIR}/.venv/bin/modelscope" upload "${REPO_ID}" "${SCRIPT_DIR}" . \
+  --repo-type model \
+  --commit-message "Upload Qwen3.5-27B quantized GGUF weights"
+
+echo "Upload finished: ${REPO_ID}"