1.8 KiB
1.8 KiB
Qwen3.5-27B 量化手册(示例归档)
本文件是 examples/qwen35_27b 的历史实操记录,已按当前仓库结构整理。
1. 输入与输出
输入 BF16 GGUF:
artifacts/qwen35_27b/base_gguf/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf
输出:
artifacts/qwen35_27b/quantized_gguf/Qwen3.5-27B-IQ4_KS.ggufartifacts/qwen35_27b/quantized_gguf/Qwen3.5-27B-IQ5_K.ggufartifacts/qwen35_27b/quantized_gguf/Qwen3.5-27B-IQ6_K.ggufexamples/qwen35_27b/artifacts/Qwen3.5-27B.imatrix.dat
2. 校准数据
执行:
./.venv/bin/python scripts/prepare_calib_data.py --force-refresh
混合数据:calibration/calibration_data_v5_rc_code.txt(4152 blocks)。
3. imatrix
docker run --gpus all --rm \
--entrypoint sh \
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
-v /home/zly/project/modelscope_qwen35_27b_quantized/calibration/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \
hotwa/ik:latest \
-c "/llama-imatrix \
-m /workspace/models/artifacts/qwen35_27b/base_gguf/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
-f /workspace/calib_data.txt \
-o /workspace/models/examples/qwen35_27b/artifacts/Qwen3.5-27B.imatrix.dat \
--ctx-size 512 -ngl 99 --threads 16"
4. 量化
示例(IQ4_KS):
docker run --gpus all --rm \
--entrypoint sh \
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
hotwa/ik:latest \
-c "/llama-quantize \
--imatrix /workspace/models/examples/qwen35_27b/artifacts/Qwen3.5-27B.imatrix.dat \
/workspace/models/artifacts/qwen35_27b/base_gguf/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
/workspace/models/artifacts/qwen35_27b/quantized_gguf/Qwen3.5-27B-IQ4_KS.gguf \
IQ4_KS"
IQ5_K 与 IQ6_K 仅替换输出名与量化类型。