# Qwen3.5-27B 量化操作手册(ik_llama.cpp Docker 版) ## 1. 目标与范围 本手册用于在目录 `/home/zly/project/modelscope_qwen35_27b_quantized` 中,使用 `ik_llama.cpp` 对 Qwen3.5-27B BF16 GGUF 进行 imatrix 计算与量化,产出: - `Qwen3.5-27B.imatrix.dat` - `Qwen3.5-27B-IQ4_KS.gguf` - `Qwen3.5-27B-IQ5_K.gguf` - `Qwen3.5-27B-IQ6_K.gguf` 镜像:`hotwa/ik:latest` 核心工具:`/llama-imatrix`、`/llama-quantize` --- ## 2. 前置条件 - Docker 可用并有权限访问 daemon - NVIDIA GPU 可用(推荐) - 当前目录存在 BF16 输入文件: - `Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf` - 当前目录存在 Python 环境和脚本: - `./.venv/bin/python` - `prepare_calib_data.py` 检查命令: ```bash cd /home/zly/project/modelscope_qwen35_27b_quantized docker run --rm --gpus all --entrypoint sh hotwa/ik:latest -c "ls -la /llama-imatrix /llama-quantize" ls -lh Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf ``` --- ## 3. 准备校准数据 ### 3.1 下载基础校准文件(1152 blocks 来源) 推荐(社区常用版本): ```bash cd /home/zly/project/modelscope_qwen35_27b_quantized wget -O calibration_data_v5_rc.txt \ "https://gist.githubusercontent.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c/raw/571fda718462de863e5a0171078c175420c7649a/calibration_data_v5_rc.txt" ``` 官方备用源(网络可达时): ```bash wget -O calibration_data_v5_rc.txt \ "https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/calibration/calibration_data.txt" ``` ### 3.2 生成混合校准集 脚本目标组成(严格): - 基础数据:1152 blocks(`calibration_data_v5_rc.txt`) - 代码对话:2000 blocks(`QuixiAI/Code-74k-ShareGPT-Vicuna`) - 代码偏好:1000 blocks(`alvarobartt/openhermes-preferences-coding`) 执行: ```bash cd /home/zly/project/modelscope_qwen35_27b_quantized ./.venv/bin/python prepare_calib_data.py --force-refresh ``` ### 3.3 校验 block 数 ```bash cd /home/zly/project/modelscope_qwen35_27b_quantized ./.venv/bin/python - <<'PY' import re from pathlib import Path def count_blocks(path): txt = Path(path).read_text(encoding="utf-8", errors="ignore") return len([b for b in re.split(r"\n\s*\n", txt) if b.strip()]) print("base =", count_blocks("calibration_data_v5_rc.txt")) print("mix =", count_blocks("calibration_data_v5_rc_code.txt")) PY ``` 期望: - `base = 1152` - `mix = 4152`(1152 + 2000 + 1000) --- ## 4. 生成 imatrix ```bash cd /home/zly/project/modelscope_qwen35_27b_quantized docker run --gpus all --rm \ --entrypoint sh \ -v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \ -v /home/zly/project/modelscope_qwen35_27b_quantized/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \ hotwa/ik:latest \ -c "/llama-imatrix \ -m /workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \ -f /workspace/calib_data.txt \ -o /workspace/models/Qwen3.5-27B.imatrix.dat \ --ctx-size 512 \ -ngl 99 \ --threads 16" ``` 完成校验: ```bash ls -lh Qwen3.5-27B.imatrix.dat ``` --- ## 5. 量化三种格式 ### 5.1 IQ4_KS ```bash cd /home/zly/project/modelscope_qwen35_27b_quantized docker run --gpus all --rm \ --entrypoint sh \ -v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \ hotwa/ik:latest \ -c "/llama-quantize \ --imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \ /workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \ /workspace/models/Qwen3.5-27B-IQ4_KS.gguf \ IQ4_KS" ``` ### 5.2 IQ5_K ```bash cd /home/zly/project/modelscope_qwen35_27b_quantized docker run --gpus all --rm \ --entrypoint sh \ -v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \ hotwa/ik:latest \ -c "/llama-quantize \ --imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \ /workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \ /workspace/models/Qwen3.5-27B-IQ5_K.gguf \ IQ5_K" ``` ### 5.3 IQ6_K ```bash cd /home/zly/project/modelscope_qwen35_27b_quantized docker run --gpus all --rm \ --entrypoint sh \ -v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \ hotwa/ik:latest \ -c "/llama-quantize \ --imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \ /workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \ /workspace/models/Qwen3.5-27B-IQ6_K.gguf \ IQ6_K" ``` --- ## 6. 一次性校验结果 ```bash cd /home/zly/project/modelscope_qwen35_27b_quantized ls -lh Qwen3.5-27B.imatrix.dat Qwen3.5-27B-IQ4_KS.gguf Qwen3.5-27B-IQ5_K.gguf Qwen3.5-27B-IQ6_K.gguf ``` 本次实测(2026-03-02): - `Qwen3.5-27B.imatrix.dat` = `13,582,647` bytes(约 `12.95 MB`) - `Qwen3.5-27B-IQ4_KS.gguf` = `14,705,833,248` bytes(约 `13.70 GB`) - `Qwen3.5-27B-IQ5_K.gguf` = `18,679,612,704` bytes(约 `17.40 GB`) - `Qwen3.5-27B-IQ6_K.gguf` = `22,292,632,864` bytes(约 `20.76 GB`) --- ## 7. 常见问题 ### 7.1 `docker.sock` 权限错误 现象:`permission denied while trying to connect to the Docker daemon socket` 处理: - 使用具备 Docker 权限的用户执行 - 或检查 `docker` 用户组配置 ### 7.2 下载源 DNS 失败 现象:`unable to resolve host address` 处理: - 优先使用 gist 源(见 3.1) - 或配置可用代理后重试 ### 7.3 输出文件属主为 `root` 容器写文件可能生成 root 属主。按需修正: ```bash cd /home/zly/project/modelscope_qwen35_27b_quantized sudo chown -R $(id -u):$(id -g) Qwen3.5-27B*.gguf Qwen3.5-27B.imatrix.dat ``` --- ## 8. ModelScope 发布最小清单(可选) - `README.md`(>= 200 字,包含任务和许可信息) - `configuration.json`(包含 `framework`、`task`、`model.type`) - `.gitattributes`(`*.gguf` 走 LFS) - 量化文件: - `Qwen3.5-27B-IQ4_KS.gguf` - `Qwen3.5-27B-IQ5_K.gguf` - `Qwen3.5-27B-IQ6_K.gguf`