Files
llm-gguf-quant-template/docs/QWEN35_QUANTIZATION_MANUAL.md

5.9 KiB
Raw Blame History

Qwen3.5-27B 量化操作手册ik_llama.cpp Docker 版)

1. 目标与范围

本手册用于在目录 /home/zly/project/modelscope_qwen35_27b_quantized 中,使用 ik_llama.cpp 对 Qwen3.5-27B BF16 GGUF 进行 imatrix 计算与量化,产出:

  • Qwen3.5-27B.imatrix.dat
  • Qwen3.5-27B-IQ4_KS.gguf
  • Qwen3.5-27B-IQ5_K.gguf
  • Qwen3.5-27B-IQ6_K.gguf

镜像:hotwa/ik:latest
核心工具:/llama-imatrix/llama-quantize


2. 前置条件

  • Docker 可用并有权限访问 daemon
  • NVIDIA GPU 可用(推荐)
  • 当前目录存在 BF16 输入文件:
    • Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf
  • 当前目录存在 Python 环境和脚本:
    • ./.venv/bin/python
    • prepare_calib_data.py

检查命令:

cd /home/zly/project/modelscope_qwen35_27b_quantized

docker run --rm --gpus all --entrypoint sh hotwa/ik:latest -c "ls -la /llama-imatrix /llama-quantize"
ls -lh Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf

3. 准备校准数据

3.1 下载基础校准文件1152 blocks 来源)

推荐(社区常用版本):

cd /home/zly/project/modelscope_qwen35_27b_quantized
wget -O calibration_data_v5_rc.txt \
  "https://gist.githubusercontent.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c/raw/571fda718462de863e5a0171078c175420c7649a/calibration_data_v5_rc.txt"

官方备用源(网络可达时):

wget -O calibration_data_v5_rc.txt \
  "https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/calibration/calibration_data.txt"

3.2 生成混合校准集

脚本目标组成(严格):

  • 基础数据1152 blockscalibration_data_v5_rc.txt
  • 代码对话2000 blocksQuixiAI/Code-74k-ShareGPT-Vicuna
  • 代码偏好1000 blocksalvarobartt/openhermes-preferences-coding

执行:

cd /home/zly/project/modelscope_qwen35_27b_quantized
./.venv/bin/python prepare_calib_data.py --force-refresh

3.3 校验 block 数

cd /home/zly/project/modelscope_qwen35_27b_quantized

./.venv/bin/python - <<'PY'
import re
from pathlib import Path

def count_blocks(path):
    txt = Path(path).read_text(encoding="utf-8", errors="ignore")
    return len([b for b in re.split(r"\n\s*\n", txt) if b.strip()])

print("base =", count_blocks("calibration_data_v5_rc.txt"))
print("mix  =", count_blocks("calibration_data_v5_rc_code.txt"))
PY

期望:

  • base = 1152
  • mix = 41521152 + 2000 + 1000

4. 生成 imatrix

cd /home/zly/project/modelscope_qwen35_27b_quantized

docker run --gpus all --rm \
  --entrypoint sh \
  -v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
  -v /home/zly/project/modelscope_qwen35_27b_quantized/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \
  hotwa/ik:latest \
  -c "/llama-imatrix \
    -m /workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
    -f /workspace/calib_data.txt \
    -o /workspace/models/Qwen3.5-27B.imatrix.dat \
    --ctx-size 512 \
    -ngl 99 \
    --threads 16"

完成校验:

ls -lh Qwen3.5-27B.imatrix.dat

5. 量化三种格式

5.1 IQ4_KS

cd /home/zly/project/modelscope_qwen35_27b_quantized

docker run --gpus all --rm \
  --entrypoint sh \
  -v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
  hotwa/ik:latest \
  -c "/llama-quantize \
    --imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
    /workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
    /workspace/models/Qwen3.5-27B-IQ4_KS.gguf \
    IQ4_KS"

5.2 IQ5_K

cd /home/zly/project/modelscope_qwen35_27b_quantized

docker run --gpus all --rm \
  --entrypoint sh \
  -v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
  hotwa/ik:latest \
  -c "/llama-quantize \
    --imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
    /workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
    /workspace/models/Qwen3.5-27B-IQ5_K.gguf \
    IQ5_K"

5.3 IQ6_K

cd /home/zly/project/modelscope_qwen35_27b_quantized

docker run --gpus all --rm \
  --entrypoint sh \
  -v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
  hotwa/ik:latest \
  -c "/llama-quantize \
    --imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
    /workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
    /workspace/models/Qwen3.5-27B-IQ6_K.gguf \
    IQ6_K"

6. 一次性校验结果

cd /home/zly/project/modelscope_qwen35_27b_quantized
ls -lh Qwen3.5-27B.imatrix.dat Qwen3.5-27B-IQ4_KS.gguf Qwen3.5-27B-IQ5_K.gguf Qwen3.5-27B-IQ6_K.gguf

本次实测2026-03-02

  • Qwen3.5-27B.imatrix.dat = 13,582,647 bytes12.95 MB
  • Qwen3.5-27B-IQ4_KS.gguf = 14,705,833,248 bytes13.70 GB
  • Qwen3.5-27B-IQ5_K.gguf = 18,679,612,704 bytes17.40 GB
  • Qwen3.5-27B-IQ6_K.gguf = 22,292,632,864 bytes20.76 GB

7. 常见问题

7.1 docker.sock 权限错误

现象:permission denied while trying to connect to the Docker daemon socket

处理:

  • 使用具备 Docker 权限的用户执行
  • 或检查 docker 用户组配置

7.2 下载源 DNS 失败

现象:unable to resolve host address

处理:

  • 优先使用 gist 源(见 3.1
  • 或配置可用代理后重试

7.3 输出文件属主为 root

容器写文件可能生成 root 属主。按需修正:

cd /home/zly/project/modelscope_qwen35_27b_quantized
sudo chown -R $(id -u):$(id -g) Qwen3.5-27B*.gguf Qwen3.5-27B.imatrix.dat

8. ModelScope 发布最小清单(可选)

  • README.md>= 200 字,包含任务和许可信息)
  • configuration.json(包含 frameworktaskmodel.type
  • .gitattributes*.gguf 走 LFS
  • 量化文件:
    • Qwen3.5-27B-IQ4_KS.gguf
    • Qwen3.5-27B-IQ5_K.gguf
    • Qwen3.5-27B-IQ6_K.gguf