Files
llm-gguf-quant-template/examples/qwen35_27b/docs/QWEN35_QUANTIZATION_MANUAL.md
2026-03-02 23:22:33 +08:00

1.8 KiB
Raw Permalink Blame History

Qwen3.5-27B 量化手册(示例归档)

本文件是 examples/qwen35_27b 的历史实操记录,已按当前仓库结构整理。

1. 输入与输出

输入 BF16 GGUF

  • artifacts/qwen35_27b/base_gguf/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf

输出:

  • artifacts/qwen35_27b/quantized_gguf/Qwen3.5-27B-IQ4_KS.gguf
  • artifacts/qwen35_27b/quantized_gguf/Qwen3.5-27B-IQ5_K.gguf
  • artifacts/qwen35_27b/quantized_gguf/Qwen3.5-27B-IQ6_K.gguf
  • examples/qwen35_27b/artifacts/Qwen3.5-27B.imatrix.dat

2. 校准数据

执行:

./.venv/bin/python scripts/prepare_calib_data.py --force-refresh

混合数据:calibration/calibration_data_v5_rc_code.txt4152 blocks

3. imatrix

docker run --gpus all --rm \
  --entrypoint sh \
  -v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
  -v /home/zly/project/modelscope_qwen35_27b_quantized/calibration/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \
  hotwa/ik:latest \
  -c "/llama-imatrix \
    -m /workspace/models/artifacts/qwen35_27b/base_gguf/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
    -f /workspace/calib_data.txt \
    -o /workspace/models/examples/qwen35_27b/artifacts/Qwen3.5-27B.imatrix.dat \
    --ctx-size 512 -ngl 99 --threads 16"

4. 量化

示例IQ4_KS

docker run --gpus all --rm \
  --entrypoint sh \
  -v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
  hotwa/ik:latest \
  -c "/llama-quantize \
    --imatrix /workspace/models/examples/qwen35_27b/artifacts/Qwen3.5-27B.imatrix.dat \
    /workspace/models/artifacts/qwen35_27b/base_gguf/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
    /workspace/models/artifacts/qwen35_27b/quantized_gguf/Qwen3.5-27B-IQ4_KS.gguf \
    IQ4_KS"

IQ5_KIQ6_K 仅替换输出名与量化类型。