chore: bootstrap reusable quantization template workspace

This commit is contained in:
2026-03-02 23:07:48 +08:00
commit 1c5822d16b
15 changed files with 167197 additions and 0 deletions

View File

@@ -0,0 +1,94 @@
# ModelScope 上传 SOP当前项目
## 1. 目录与文件
工作目录:
`/home/zly/project/modelscope_qwen35_27b_quantized`
上传目录:
`/home/zly/project/modelscope_qwen35_27b_quantized/modelscope_upload`
上传目录应包含:
- `README.md`(超过 200 字,含 `tasks``license`
- `configuration.json`
- `.gitattributes`
- `Qwen3.5-27B-IQ4_KS.gguf`
- `Qwen3.5-27B-IQ5_K.gguf`
- `Qwen3.5-27B-IQ6_K.gguf`
- `Qwen3.5-27B.imatrix.dat`
快速检查:
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
ls -lah modelscope_upload
wc -m modelscope_upload/README.md
```
## 2. 环境准备
使用本地虚拟环境:
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
./.venv/bin/python -V
./.venv/bin/modelscope --version
```
如果未安装:
```bash
./.venv/bin/pip install -U modelscope "setuptools<81"
```
## 3. 登录 ModelScope
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
./.venv/bin/modelscope login --token "<YOUR_MODELSCOPE_TOKEN>"
```
## 4. 上传(推荐:直连无代理)
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
env -u HTTP_PROXY -u HTTPS_PROXY -u ALL_PROXY -u NO_PROXY \
./.venv/bin/modelscope upload \
"jiaoyuan/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF" \
"./modelscope_upload" \
. \
--repo-type model \
--commit-message "Upload Qwen3.5-27B quantized GGUF weights"
```
## 5. 上传(如需走代理)
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
./.venv/bin/modelscope upload \
"jiaoyuan/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF" \
"./modelscope_upload" \
. \
--repo-type model \
--commit-message "Upload Qwen3.5-27B quantized GGUF weights"
```
## 6. 断点/重传说明
- 上传中断后可直接重复执行第 4 步或第 5 步命令。
- CLI 会先做 hash 校验并复用已上传分片,不需要手工删除本地文件。
## 7. 发布后检查
仓库地址:
`https://www.modelscope.cn/models/jiaoyuan/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF`
检查点:
- 文件是否完整显示3 个 GGUF + 1 个 imatrix + 元数据)
- README 是否正确展示任务和许可
- 页面是否脱离预发布状态(若仍预发布,可补充说明后再申诉)

View File

@@ -0,0 +1,231 @@
# Qwen3.5-27B 量化操作手册ik_llama.cpp Docker 版)
## 1. 目标与范围
本手册用于在目录 `/home/zly/project/modelscope_qwen35_27b_quantized` 中,使用 `ik_llama.cpp` 对 Qwen3.5-27B BF16 GGUF 进行 imatrix 计算与量化,产出:
- `Qwen3.5-27B.imatrix.dat`
- `Qwen3.5-27B-IQ4_KS.gguf`
- `Qwen3.5-27B-IQ5_K.gguf`
- `Qwen3.5-27B-IQ6_K.gguf`
镜像:`hotwa/ik:latest`
核心工具:`/llama-imatrix``/llama-quantize`
---
## 2. 前置条件
- Docker 可用并有权限访问 daemon
- NVIDIA GPU 可用(推荐)
- 当前目录存在 BF16 输入文件:
- `Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf`
- 当前目录存在 Python 环境和脚本:
- `./.venv/bin/python`
- `prepare_calib_data.py`
检查命令:
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
docker run --rm --gpus all --entrypoint sh hotwa/ik:latest -c "ls -la /llama-imatrix /llama-quantize"
ls -lh Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf
```
---
## 3. 准备校准数据
### 3.1 下载基础校准文件1152 blocks 来源)
推荐(社区常用版本):
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
wget -O calibration_data_v5_rc.txt \
"https://gist.githubusercontent.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c/raw/571fda718462de863e5a0171078c175420c7649a/calibration_data_v5_rc.txt"
```
官方备用源(网络可达时):
```bash
wget -O calibration_data_v5_rc.txt \
"https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/calibration/calibration_data.txt"
```
### 3.2 生成混合校准集
脚本目标组成(严格):
- 基础数据1152 blocks`calibration_data_v5_rc.txt`
- 代码对话2000 blocks`QuixiAI/Code-74k-ShareGPT-Vicuna`
- 代码偏好1000 blocks`alvarobartt/openhermes-preferences-coding`
执行:
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
./.venv/bin/python prepare_calib_data.py --force-refresh
```
### 3.3 校验 block 数
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
./.venv/bin/python - <<'PY'
import re
from pathlib import Path
def count_blocks(path):
txt = Path(path).read_text(encoding="utf-8", errors="ignore")
return len([b for b in re.split(r"\n\s*\n", txt) if b.strip()])
print("base =", count_blocks("calibration_data_v5_rc.txt"))
print("mix =", count_blocks("calibration_data_v5_rc_code.txt"))
PY
```
期望:
- `base = 1152`
- `mix = 4152`1152 + 2000 + 1000
---
## 4. 生成 imatrix
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
docker run --gpus all --rm \
--entrypoint sh \
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
-v /home/zly/project/modelscope_qwen35_27b_quantized/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \
hotwa/ik:latest \
-c "/llama-imatrix \
-m /workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
-f /workspace/calib_data.txt \
-o /workspace/models/Qwen3.5-27B.imatrix.dat \
--ctx-size 512 \
-ngl 99 \
--threads 16"
```
完成校验:
```bash
ls -lh Qwen3.5-27B.imatrix.dat
```
---
## 5. 量化三种格式
### 5.1 IQ4_KS
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
docker run --gpus all --rm \
--entrypoint sh \
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
hotwa/ik:latest \
-c "/llama-quantize \
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
/workspace/models/Qwen3.5-27B-IQ4_KS.gguf \
IQ4_KS"
```
### 5.2 IQ5_K
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
docker run --gpus all --rm \
--entrypoint sh \
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
hotwa/ik:latest \
-c "/llama-quantize \
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
/workspace/models/Qwen3.5-27B-IQ5_K.gguf \
IQ5_K"
```
### 5.3 IQ6_K
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
docker run --gpus all --rm \
--entrypoint sh \
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
hotwa/ik:latest \
-c "/llama-quantize \
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
/workspace/models/Qwen3.5-27B-IQ6_K.gguf \
IQ6_K"
```
---
## 6. 一次性校验结果
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
ls -lh Qwen3.5-27B.imatrix.dat Qwen3.5-27B-IQ4_KS.gguf Qwen3.5-27B-IQ5_K.gguf Qwen3.5-27B-IQ6_K.gguf
```
本次实测2026-03-02
- `Qwen3.5-27B.imatrix.dat` = `13,582,647` bytes`12.95 MB`
- `Qwen3.5-27B-IQ4_KS.gguf` = `14,705,833,248` bytes`13.70 GB`
- `Qwen3.5-27B-IQ5_K.gguf` = `18,679,612,704` bytes`17.40 GB`
- `Qwen3.5-27B-IQ6_K.gguf` = `22,292,632,864` bytes`20.76 GB`
---
## 7. 常见问题
### 7.1 `docker.sock` 权限错误
现象:`permission denied while trying to connect to the Docker daemon socket`
处理:
- 使用具备 Docker 权限的用户执行
- 或检查 `docker` 用户组配置
### 7.2 下载源 DNS 失败
现象:`unable to resolve host address`
处理:
- 优先使用 gist 源(见 3.1
- 或配置可用代理后重试
### 7.3 输出文件属主为 `root`
容器写文件可能生成 root 属主。按需修正:
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
sudo chown -R $(id -u):$(id -g) Qwen3.5-27B*.gguf Qwen3.5-27B.imatrix.dat
```
---
## 8. ModelScope 发布最小清单(可选)
- `README.md`>= 200 字,包含任务和许可信息)
- `configuration.json`(包含 `framework``task``model.type`
- `.gitattributes``*.gguf` 走 LFS
- 量化文件:
- `Qwen3.5-27B-IQ4_KS.gguf`
- `Qwen3.5-27B-IQ5_K.gguf`
- `Qwen3.5-27B-IQ6_K.gguf`