60 lines
1.8 KiB
Markdown
60 lines
1.8 KiB
Markdown
# Qwen3.5-27B 量化手册(示例归档)
|
||
|
||
本文件是 `examples/qwen35_27b` 的历史实操记录,已按当前仓库结构整理。
|
||
|
||
## 1. 输入与输出
|
||
|
||
输入 BF16 GGUF:
|
||
|
||
- `artifacts/qwen35_27b/base_gguf/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf`
|
||
|
||
输出:
|
||
|
||
- `artifacts/qwen35_27b/quantized_gguf/Qwen3.5-27B-IQ4_KS.gguf`
|
||
- `artifacts/qwen35_27b/quantized_gguf/Qwen3.5-27B-IQ5_K.gguf`
|
||
- `artifacts/qwen35_27b/quantized_gguf/Qwen3.5-27B-IQ6_K.gguf`
|
||
- `examples/qwen35_27b/artifacts/Qwen3.5-27B.imatrix.dat`
|
||
|
||
## 2. 校准数据
|
||
|
||
执行:
|
||
|
||
```bash
|
||
./.venv/bin/python scripts/prepare_calib_data.py --force-refresh
|
||
```
|
||
|
||
混合数据:`calibration/calibration_data_v5_rc_code.txt`(4152 blocks)。
|
||
|
||
## 3. imatrix
|
||
|
||
```bash
|
||
docker run --gpus all --rm \
|
||
--entrypoint sh \
|
||
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
|
||
-v /home/zly/project/modelscope_qwen35_27b_quantized/calibration/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \
|
||
hotwa/ik:latest \
|
||
-c "/llama-imatrix \
|
||
-m /workspace/models/artifacts/qwen35_27b/base_gguf/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
|
||
-f /workspace/calib_data.txt \
|
||
-o /workspace/models/examples/qwen35_27b/artifacts/Qwen3.5-27B.imatrix.dat \
|
||
--ctx-size 512 -ngl 99 --threads 16"
|
||
```
|
||
|
||
## 4. 量化
|
||
|
||
示例(IQ4_KS):
|
||
|
||
```bash
|
||
docker run --gpus all --rm \
|
||
--entrypoint sh \
|
||
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
|
||
hotwa/ik:latest \
|
||
-c "/llama-quantize \
|
||
--imatrix /workspace/models/examples/qwen35_27b/artifacts/Qwen3.5-27B.imatrix.dat \
|
||
/workspace/models/artifacts/qwen35_27b/base_gguf/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
|
||
/workspace/models/artifacts/qwen35_27b/quantized_gguf/Qwen3.5-27B-IQ4_KS.gguf \
|
||
IQ4_KS"
|
||
```
|
||
|
||
`IQ5_K` 与 `IQ6_K` 仅替换输出名与量化类型。
|