232 lines
5.9 KiB
Markdown
232 lines
5.9 KiB
Markdown
# Qwen3.5-27B 量化操作手册(ik_llama.cpp Docker 版)
|
||
|
||
## 1. 目标与范围
|
||
|
||
本手册用于在目录 `/home/zly/project/modelscope_qwen35_27b_quantized` 中,使用 `ik_llama.cpp` 对 Qwen3.5-27B BF16 GGUF 进行 imatrix 计算与量化,产出:
|
||
|
||
- `Qwen3.5-27B.imatrix.dat`
|
||
- `Qwen3.5-27B-IQ4_KS.gguf`
|
||
- `Qwen3.5-27B-IQ5_K.gguf`
|
||
- `Qwen3.5-27B-IQ6_K.gguf`
|
||
|
||
镜像:`hotwa/ik:latest`
|
||
核心工具:`/llama-imatrix`、`/llama-quantize`
|
||
|
||
---
|
||
|
||
## 2. 前置条件
|
||
|
||
- Docker 可用并有权限访问 daemon
|
||
- NVIDIA GPU 可用(推荐)
|
||
- 当前目录存在 BF16 输入文件:
|
||
- `Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf`
|
||
- 当前目录存在 Python 环境和脚本:
|
||
- `./.venv/bin/python`
|
||
- `prepare_calib_data.py`
|
||
|
||
检查命令:
|
||
|
||
```bash
|
||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||
|
||
docker run --rm --gpus all --entrypoint sh hotwa/ik:latest -c "ls -la /llama-imatrix /llama-quantize"
|
||
ls -lh Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf
|
||
```
|
||
|
||
---
|
||
|
||
## 3. 准备校准数据
|
||
|
||
### 3.1 下载基础校准文件(1152 blocks 来源)
|
||
|
||
推荐(社区常用版本):
|
||
|
||
```bash
|
||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||
wget -O calibration_data_v5_rc.txt \
|
||
"https://gist.githubusercontent.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c/raw/571fda718462de863e5a0171078c175420c7649a/calibration_data_v5_rc.txt"
|
||
```
|
||
|
||
官方备用源(网络可达时):
|
||
|
||
```bash
|
||
wget -O calibration_data_v5_rc.txt \
|
||
"https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/calibration/calibration_data.txt"
|
||
```
|
||
|
||
### 3.2 生成混合校准集
|
||
|
||
脚本目标组成(严格):
|
||
|
||
- 基础数据:1152 blocks(`calibration_data_v5_rc.txt`)
|
||
- 代码对话:2000 blocks(`QuixiAI/Code-74k-ShareGPT-Vicuna`)
|
||
- 代码偏好:1000 blocks(`alvarobartt/openhermes-preferences-coding`)
|
||
|
||
执行:
|
||
|
||
```bash
|
||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||
./.venv/bin/python prepare_calib_data.py --force-refresh
|
||
```
|
||
|
||
### 3.3 校验 block 数
|
||
|
||
```bash
|
||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||
|
||
./.venv/bin/python - <<'PY'
|
||
import re
|
||
from pathlib import Path
|
||
|
||
def count_blocks(path):
|
||
txt = Path(path).read_text(encoding="utf-8", errors="ignore")
|
||
return len([b for b in re.split(r"\n\s*\n", txt) if b.strip()])
|
||
|
||
print("base =", count_blocks("calibration_data_v5_rc.txt"))
|
||
print("mix =", count_blocks("calibration_data_v5_rc_code.txt"))
|
||
PY
|
||
```
|
||
|
||
期望:
|
||
|
||
- `base = 1152`
|
||
- `mix = 4152`(1152 + 2000 + 1000)
|
||
|
||
---
|
||
|
||
## 4. 生成 imatrix
|
||
|
||
```bash
|
||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||
|
||
docker run --gpus all --rm \
|
||
--entrypoint sh \
|
||
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
|
||
-v /home/zly/project/modelscope_qwen35_27b_quantized/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \
|
||
hotwa/ik:latest \
|
||
-c "/llama-imatrix \
|
||
-m /workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
|
||
-f /workspace/calib_data.txt \
|
||
-o /workspace/models/Qwen3.5-27B.imatrix.dat \
|
||
--ctx-size 512 \
|
||
-ngl 99 \
|
||
--threads 16"
|
||
```
|
||
|
||
完成校验:
|
||
|
||
```bash
|
||
ls -lh Qwen3.5-27B.imatrix.dat
|
||
```
|
||
|
||
---
|
||
|
||
## 5. 量化三种格式
|
||
|
||
### 5.1 IQ4_KS
|
||
|
||
```bash
|
||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||
|
||
docker run --gpus all --rm \
|
||
--entrypoint sh \
|
||
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
|
||
hotwa/ik:latest \
|
||
-c "/llama-quantize \
|
||
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
|
||
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
|
||
/workspace/models/Qwen3.5-27B-IQ4_KS.gguf \
|
||
IQ4_KS"
|
||
```
|
||
|
||
### 5.2 IQ5_K
|
||
|
||
```bash
|
||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||
|
||
docker run --gpus all --rm \
|
||
--entrypoint sh \
|
||
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
|
||
hotwa/ik:latest \
|
||
-c "/llama-quantize \
|
||
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
|
||
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
|
||
/workspace/models/Qwen3.5-27B-IQ5_K.gguf \
|
||
IQ5_K"
|
||
```
|
||
|
||
### 5.3 IQ6_K
|
||
|
||
```bash
|
||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||
|
||
docker run --gpus all --rm \
|
||
--entrypoint sh \
|
||
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
|
||
hotwa/ik:latest \
|
||
-c "/llama-quantize \
|
||
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
|
||
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
|
||
/workspace/models/Qwen3.5-27B-IQ6_K.gguf \
|
||
IQ6_K"
|
||
```
|
||
|
||
---
|
||
|
||
## 6. 一次性校验结果
|
||
|
||
```bash
|
||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||
ls -lh Qwen3.5-27B.imatrix.dat Qwen3.5-27B-IQ4_KS.gguf Qwen3.5-27B-IQ5_K.gguf Qwen3.5-27B-IQ6_K.gguf
|
||
```
|
||
|
||
本次实测(2026-03-02):
|
||
|
||
- `Qwen3.5-27B.imatrix.dat` = `13,582,647` bytes(约 `12.95 MB`)
|
||
- `Qwen3.5-27B-IQ4_KS.gguf` = `14,705,833,248` bytes(约 `13.70 GB`)
|
||
- `Qwen3.5-27B-IQ5_K.gguf` = `18,679,612,704` bytes(约 `17.40 GB`)
|
||
- `Qwen3.5-27B-IQ6_K.gguf` = `22,292,632,864` bytes(约 `20.76 GB`)
|
||
|
||
---
|
||
|
||
## 7. 常见问题
|
||
|
||
### 7.1 `docker.sock` 权限错误
|
||
|
||
现象:`permission denied while trying to connect to the Docker daemon socket`
|
||
|
||
处理:
|
||
|
||
- 使用具备 Docker 权限的用户执行
|
||
- 或检查 `docker` 用户组配置
|
||
|
||
### 7.2 下载源 DNS 失败
|
||
|
||
现象:`unable to resolve host address`
|
||
|
||
处理:
|
||
|
||
- 优先使用 gist 源(见 3.1)
|
||
- 或配置可用代理后重试
|
||
|
||
### 7.3 输出文件属主为 `root`
|
||
|
||
容器写文件可能生成 root 属主。按需修正:
|
||
|
||
```bash
|
||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||
sudo chown -R $(id -u):$(id -g) Qwen3.5-27B*.gguf Qwen3.5-27B.imatrix.dat
|
||
```
|
||
|
||
---
|
||
|
||
## 8. ModelScope 发布最小清单(可选)
|
||
|
||
- `README.md`(>= 200 字,包含任务和许可信息)
|
||
- `configuration.json`(包含 `framework`、`task`、`model.type`)
|
||
- `.gitattributes`(`*.gguf` 走 LFS)
|
||
- 量化文件:
|
||
- `Qwen3.5-27B-IQ4_KS.gguf`
|
||
- `Qwen3.5-27B-IQ5_K.gguf`
|
||
- `Qwen3.5-27B-IQ6_K.gguf`
|