AGENTS

Purpose

本仓库用于复用 LLM 量化发布流程：HF safetensors -> BF16 GGUF -> imatrix -> IQ4/IQ5/IQ6 -> ModelScope 上传目录。

Repository Contract

模板文档：docs/
脚本：scripts/
模板文件：templates/
案例：examples/<model_name>/
上传工作区：modelscope_upload/
大产物：artifacts/（忽略）

Hard Rules

禁止提交任何权重文件（*.gguf, *.safetensors, *.bin, *.pt 等）。
禁止提交 token、密钥、账号凭据。
新模型必须新增 examples/<model_name>/README.md 记录关键参数。
任何脚本或流程变更，必须同步更新 docs/。

Standard Quantization Skill

0) Prerequisites

Python venv: ./.venv
Docker + GPU（推荐）
可用的 HF 模型目录（safetensors）
hotwa/ik:latest 可拉取

1) HF -> BF16 GGUF

在 ik_llama.cpp 中执行：

python convert_hf_to_gguf.py \
  <hf_model_dir> \
  --outtype bf16 \
  --outfile <output_bf16_gguf>

将 BF16 GGUF 放入 artifacts/<model_name>/base_gguf/。

2) Build Calibration Dataset

执行：

./.venv/bin/python scripts/prepare_calib_data.py --force-refresh

目标输出：calibration/calibration_data_v5_rc_code.txt，严格组成：

1152 blocks: calibration_data_v5_rc.txt
2000 blocks: QuixiAI/Code-74k-ShareGPT-Vicuna
1000 blocks: alvarobartt/openhermes-preferences-coding

3) Generate imatrix

docker run --gpus all --rm \
  --entrypoint sh \
  -v <repo_root>:/workspace/models \
  -v <repo_root>/calibration/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \
  hotwa/ik:latest \
  -c "/llama-imatrix -m <bf16_gguf> -f /workspace/calib_data.txt -o <imatrix_out> --ctx-size 512 -ngl 99 --threads 16"

4) Quantize

分别导出 IQ4_KS, IQ5_K, IQ6_K：

docker run --gpus all --rm \
  --entrypoint sh \
  -v <repo_root>:/workspace/models \
  hotwa/ik:latest \
  -c "/llama-quantize --imatrix <imatrix_out> <bf16_gguf> <out_gguf> IQ4_KS"

5) Prepare ModelScope Folder

复制 templates/modelscope/* 到 modelscope_upload/
填写 README.md 与 configuration.json
放入量化产物（GGUF + imatrix）

6) Upload

使用脚本：

./scripts/upload_to_modelscope.sh <repo_id> <token> modelscope_upload direct "Upload quantized GGUF"

direct 会自动关闭代理；proxy 保留代理。

Definition Of Done

BF16 GGUF 存在
imatrix.dat 存在
IQ4/IQ5/IQ6 均存在
modelscope_upload/README.md > 200 字且含 tasks、license
modelscope_upload/configuration.json 字段完整
examples/<model_name>/ 已补充记录

2.6 KiB Raw Permalink Blame History Unescape Escape