5.9 KiB
5.9 KiB
Qwen3.5-27B 量化操作手册(ik_llama.cpp Docker 版)
1. 目标与范围
本手册用于在目录 /home/zly/project/modelscope_qwen35_27b_quantized 中,使用 ik_llama.cpp 对 Qwen3.5-27B BF16 GGUF 进行 imatrix 计算与量化,产出:
Qwen3.5-27B.imatrix.datQwen3.5-27B-IQ4_KS.ggufQwen3.5-27B-IQ5_K.ggufQwen3.5-27B-IQ6_K.gguf
镜像:hotwa/ik:latest
核心工具:/llama-imatrix、/llama-quantize
2. 前置条件
- Docker 可用并有权限访问 daemon
- NVIDIA GPU 可用(推荐)
- 当前目录存在 BF16 输入文件:
Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf
- 当前目录存在 Python 环境和脚本:
./.venv/bin/pythonprepare_calib_data.py
检查命令:
cd /home/zly/project/modelscope_qwen35_27b_quantized
docker run --rm --gpus all --entrypoint sh hotwa/ik:latest -c "ls -la /llama-imatrix /llama-quantize"
ls -lh Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf
3. 准备校准数据
3.1 下载基础校准文件(1152 blocks 来源)
推荐(社区常用版本):
cd /home/zly/project/modelscope_qwen35_27b_quantized
wget -O calibration_data_v5_rc.txt \
"https://gist.githubusercontent.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c/raw/571fda718462de863e5a0171078c175420c7649a/calibration_data_v5_rc.txt"
官方备用源(网络可达时):
wget -O calibration_data_v5_rc.txt \
"https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/calibration/calibration_data.txt"
3.2 生成混合校准集
脚本目标组成(严格):
- 基础数据:1152 blocks(
calibration_data_v5_rc.txt) - 代码对话:2000 blocks(
QuixiAI/Code-74k-ShareGPT-Vicuna) - 代码偏好:1000 blocks(
alvarobartt/openhermes-preferences-coding)
执行:
cd /home/zly/project/modelscope_qwen35_27b_quantized
./.venv/bin/python prepare_calib_data.py --force-refresh
3.3 校验 block 数
cd /home/zly/project/modelscope_qwen35_27b_quantized
./.venv/bin/python - <<'PY'
import re
from pathlib import Path
def count_blocks(path):
txt = Path(path).read_text(encoding="utf-8", errors="ignore")
return len([b for b in re.split(r"\n\s*\n", txt) if b.strip()])
print("base =", count_blocks("calibration_data_v5_rc.txt"))
print("mix =", count_blocks("calibration_data_v5_rc_code.txt"))
PY
期望:
base = 1152mix = 4152(1152 + 2000 + 1000)
4. 生成 imatrix
cd /home/zly/project/modelscope_qwen35_27b_quantized
docker run --gpus all --rm \
--entrypoint sh \
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
-v /home/zly/project/modelscope_qwen35_27b_quantized/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \
hotwa/ik:latest \
-c "/llama-imatrix \
-m /workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
-f /workspace/calib_data.txt \
-o /workspace/models/Qwen3.5-27B.imatrix.dat \
--ctx-size 512 \
-ngl 99 \
--threads 16"
完成校验:
ls -lh Qwen3.5-27B.imatrix.dat
5. 量化三种格式
5.1 IQ4_KS
cd /home/zly/project/modelscope_qwen35_27b_quantized
docker run --gpus all --rm \
--entrypoint sh \
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
hotwa/ik:latest \
-c "/llama-quantize \
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
/workspace/models/Qwen3.5-27B-IQ4_KS.gguf \
IQ4_KS"
5.2 IQ5_K
cd /home/zly/project/modelscope_qwen35_27b_quantized
docker run --gpus all --rm \
--entrypoint sh \
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
hotwa/ik:latest \
-c "/llama-quantize \
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
/workspace/models/Qwen3.5-27B-IQ5_K.gguf \
IQ5_K"
5.3 IQ6_K
cd /home/zly/project/modelscope_qwen35_27b_quantized
docker run --gpus all --rm \
--entrypoint sh \
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
hotwa/ik:latest \
-c "/llama-quantize \
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
/workspace/models/Qwen3.5-27B-IQ6_K.gguf \
IQ6_K"
6. 一次性校验结果
cd /home/zly/project/modelscope_qwen35_27b_quantized
ls -lh Qwen3.5-27B.imatrix.dat Qwen3.5-27B-IQ4_KS.gguf Qwen3.5-27B-IQ5_K.gguf Qwen3.5-27B-IQ6_K.gguf
本次实测(2026-03-02):
Qwen3.5-27B.imatrix.dat=13,582,647bytes(约12.95 MB)Qwen3.5-27B-IQ4_KS.gguf=14,705,833,248bytes(约13.70 GB)Qwen3.5-27B-IQ5_K.gguf=18,679,612,704bytes(约17.40 GB)Qwen3.5-27B-IQ6_K.gguf=22,292,632,864bytes(约20.76 GB)
7. 常见问题
7.1 docker.sock 权限错误
现象:permission denied while trying to connect to the Docker daemon socket
处理:
- 使用具备 Docker 权限的用户执行
- 或检查
docker用户组配置
7.2 下载源 DNS 失败
现象:unable to resolve host address
处理:
- 优先使用 gist 源(见 3.1)
- 或配置可用代理后重试
7.3 输出文件属主为 root
容器写文件可能生成 root 属主。按需修正:
cd /home/zly/project/modelscope_qwen35_27b_quantized
sudo chown -R $(id -u):$(id -g) Qwen3.5-27B*.gguf Qwen3.5-27B.imatrix.dat
8. ModelScope 发布最小清单(可选)
README.md(>= 200 字,包含任务和许可信息)configuration.json(包含framework、task、model.type).gitattributes(*.gguf走 LFS)- 量化文件:
Qwen3.5-27B-IQ4_KS.ggufQwen3.5-27B-IQ5_K.ggufQwen3.5-27B-IQ6_K.gguf