Files
llm-gguf-quant-template/docs/QWEN35_QUANTIZATION_MANUAL.md

232 lines
5.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Qwen3.5-27B 量化操作手册ik_llama.cpp Docker 版)
## 1. 目标与范围
本手册用于在目录 `/home/zly/project/modelscope_qwen35_27b_quantized` 中,使用 `ik_llama.cpp` 对 Qwen3.5-27B BF16 GGUF 进行 imatrix 计算与量化,产出:
- `Qwen3.5-27B.imatrix.dat`
- `Qwen3.5-27B-IQ4_KS.gguf`
- `Qwen3.5-27B-IQ5_K.gguf`
- `Qwen3.5-27B-IQ6_K.gguf`
镜像:`hotwa/ik:latest`
核心工具:`/llama-imatrix``/llama-quantize`
---
## 2. 前置条件
- Docker 可用并有权限访问 daemon
- NVIDIA GPU 可用(推荐)
- 当前目录存在 BF16 输入文件:
- `Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf`
- 当前目录存在 Python 环境和脚本:
- `./.venv/bin/python`
- `prepare_calib_data.py`
检查命令:
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
docker run --rm --gpus all --entrypoint sh hotwa/ik:latest -c "ls -la /llama-imatrix /llama-quantize"
ls -lh Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf
```
---
## 3. 准备校准数据
### 3.1 下载基础校准文件1152 blocks 来源)
推荐(社区常用版本):
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
wget -O calibration_data_v5_rc.txt \
"https://gist.githubusercontent.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c/raw/571fda718462de863e5a0171078c175420c7649a/calibration_data_v5_rc.txt"
```
官方备用源(网络可达时):
```bash
wget -O calibration_data_v5_rc.txt \
"https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/calibration/calibration_data.txt"
```
### 3.2 生成混合校准集
脚本目标组成(严格):
- 基础数据1152 blocks`calibration_data_v5_rc.txt`
- 代码对话2000 blocks`QuixiAI/Code-74k-ShareGPT-Vicuna`
- 代码偏好1000 blocks`alvarobartt/openhermes-preferences-coding`
执行:
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
./.venv/bin/python prepare_calib_data.py --force-refresh
```
### 3.3 校验 block 数
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
./.venv/bin/python - <<'PY'
import re
from pathlib import Path
def count_blocks(path):
txt = Path(path).read_text(encoding="utf-8", errors="ignore")
return len([b for b in re.split(r"\n\s*\n", txt) if b.strip()])
print("base =", count_blocks("calibration_data_v5_rc.txt"))
print("mix =", count_blocks("calibration_data_v5_rc_code.txt"))
PY
```
期望:
- `base = 1152`
- `mix = 4152`1152 + 2000 + 1000
---
## 4. 生成 imatrix
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
docker run --gpus all --rm \
--entrypoint sh \
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
-v /home/zly/project/modelscope_qwen35_27b_quantized/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \
hotwa/ik:latest \
-c "/llama-imatrix \
-m /workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
-f /workspace/calib_data.txt \
-o /workspace/models/Qwen3.5-27B.imatrix.dat \
--ctx-size 512 \
-ngl 99 \
--threads 16"
```
完成校验:
```bash
ls -lh Qwen3.5-27B.imatrix.dat
```
---
## 5. 量化三种格式
### 5.1 IQ4_KS
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
docker run --gpus all --rm \
--entrypoint sh \
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
hotwa/ik:latest \
-c "/llama-quantize \
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
/workspace/models/Qwen3.5-27B-IQ4_KS.gguf \
IQ4_KS"
```
### 5.2 IQ5_K
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
docker run --gpus all --rm \
--entrypoint sh \
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
hotwa/ik:latest \
-c "/llama-quantize \
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
/workspace/models/Qwen3.5-27B-IQ5_K.gguf \
IQ5_K"
```
### 5.3 IQ6_K
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
docker run --gpus all --rm \
--entrypoint sh \
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
hotwa/ik:latest \
-c "/llama-quantize \
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
/workspace/models/Qwen3.5-27B-IQ6_K.gguf \
IQ6_K"
```
---
## 6. 一次性校验结果
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
ls -lh Qwen3.5-27B.imatrix.dat Qwen3.5-27B-IQ4_KS.gguf Qwen3.5-27B-IQ5_K.gguf Qwen3.5-27B-IQ6_K.gguf
```
本次实测2026-03-02
- `Qwen3.5-27B.imatrix.dat` = `13,582,647` bytes`12.95 MB`
- `Qwen3.5-27B-IQ4_KS.gguf` = `14,705,833,248` bytes`13.70 GB`
- `Qwen3.5-27B-IQ5_K.gguf` = `18,679,612,704` bytes`17.40 GB`
- `Qwen3.5-27B-IQ6_K.gguf` = `22,292,632,864` bytes`20.76 GB`
---
## 7. 常见问题
### 7.1 `docker.sock` 权限错误
现象:`permission denied while trying to connect to the Docker daemon socket`
处理:
- 使用具备 Docker 权限的用户执行
- 或检查 `docker` 用户组配置
### 7.2 下载源 DNS 失败
现象:`unable to resolve host address`
处理:
- 优先使用 gist 源(见 3.1
- 或配置可用代理后重试
### 7.3 输出文件属主为 `root`
容器写文件可能生成 root 属主。按需修正:
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
sudo chown -R $(id -u):$(id -g) Qwen3.5-27B*.gguf Qwen3.5-27B.imatrix.dat
```
---
## 8. ModelScope 发布最小清单(可选)
- `README.md`>= 200 字,包含任务和许可信息)
- `configuration.json`(包含 `framework``task``model.type`
- `.gitattributes``*.gguf` 走 LFS
- 量化文件:
- `Qwen3.5-27B-IQ4_KS.gguf`
- `Qwen3.5-27B-IQ5_K.gguf`
- `Qwen3.5-27B-IQ6_K.gguf`