lingyuzeng/llm-gguf-quant-template

Files

hotwa c5ae56c463 first commit

2026-03-02 23:22:33 +08:00

1.5 KiB

Raw Blame History

LLM GGUF Quantization Template

本仓库是一个可复用模板，用于完成以下全流程：

HuggingFace safetensors -> BF16 GGUF
构建混合校准数据（通用 + 代码）
基于 ik_llama.cpp 生成 imatrix
导出 IQ4_KS / IQ5_K / IQ6_K
组织 ModelScope 上传目录

目录结构

docs/：模板级流程文档与检查清单
scripts/：可复用脚本
templates/：ModelScope 元数据模板
examples/：已跑通案例（参数与记录参考）
calibration/：校准数据与数据源缓存
modelscope_upload/：当前待上传工作目录（仅元数据入库）
artifacts/：本地大产物目录（忽略）

详细结构见 docs/REPO_STRUCTURE.md。

快速开始

阅读 docs/WORKFLOW_TEMPLATE.md
按 docs/NEW_MODEL_CHECKLIST.md 执行与验收
参考 examples/qwen35_27b/ 对照参数和发布文案

校准数据标准组成

目标输出文件：calibration/calibration_data_v5_rc_code.txt

基础数据：1152 blocks（calibration_data_v5_rc.txt）
代码对话：2000 blocks（QuixiAI/Code-74k-ShareGPT-Vicuna）
代码偏好：1000 blocks（alvarobartt/openhermes-preferences-coding）

执行脚本：

./.venv/bin/python scripts/prepare_calib_data.py --force-refresh

Git 约束

禁止提交：*.gguf, *.safetensors, *.bin, *.pt 等大权重
禁止提交：token、密钥、账号凭据
流程或脚本有变更时，必须同步更新 docs/ 与案例文档