chore: bootstrap reusable quantization template workspace
This commit is contained in:
94
docs/MODELSCOPE_UPLOAD_SOP.md
Normal file
94
docs/MODELSCOPE_UPLOAD_SOP.md
Normal file
@@ -0,0 +1,94 @@
|
||||
# ModelScope 上传 SOP(当前项目)
|
||||
|
||||
## 1. 目录与文件
|
||||
|
||||
工作目录:
|
||||
|
||||
`/home/zly/project/modelscope_qwen35_27b_quantized`
|
||||
|
||||
上传目录:
|
||||
|
||||
`/home/zly/project/modelscope_qwen35_27b_quantized/modelscope_upload`
|
||||
|
||||
上传目录应包含:
|
||||
|
||||
- `README.md`(超过 200 字,含 `tasks` 和 `license`)
|
||||
- `configuration.json`
|
||||
- `.gitattributes`
|
||||
- `Qwen3.5-27B-IQ4_KS.gguf`
|
||||
- `Qwen3.5-27B-IQ5_K.gguf`
|
||||
- `Qwen3.5-27B-IQ6_K.gguf`
|
||||
- `Qwen3.5-27B.imatrix.dat`
|
||||
|
||||
快速检查:
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
ls -lah modelscope_upload
|
||||
wc -m modelscope_upload/README.md
|
||||
```
|
||||
|
||||
## 2. 环境准备
|
||||
|
||||
使用本地虚拟环境:
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
./.venv/bin/python -V
|
||||
./.venv/bin/modelscope --version
|
||||
```
|
||||
|
||||
如果未安装:
|
||||
|
||||
```bash
|
||||
./.venv/bin/pip install -U modelscope "setuptools<81"
|
||||
```
|
||||
|
||||
## 3. 登录 ModelScope
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
./.venv/bin/modelscope login --token "<YOUR_MODELSCOPE_TOKEN>"
|
||||
```
|
||||
|
||||
## 4. 上传(推荐:直连无代理)
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
env -u HTTP_PROXY -u HTTPS_PROXY -u ALL_PROXY -u NO_PROXY \
|
||||
./.venv/bin/modelscope upload \
|
||||
"jiaoyuan/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF" \
|
||||
"./modelscope_upload" \
|
||||
. \
|
||||
--repo-type model \
|
||||
--commit-message "Upload Qwen3.5-27B quantized GGUF weights"
|
||||
```
|
||||
|
||||
## 5. 上传(如需走代理)
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
./.venv/bin/modelscope upload \
|
||||
"jiaoyuan/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF" \
|
||||
"./modelscope_upload" \
|
||||
. \
|
||||
--repo-type model \
|
||||
--commit-message "Upload Qwen3.5-27B quantized GGUF weights"
|
||||
```
|
||||
|
||||
## 6. 断点/重传说明
|
||||
|
||||
- 上传中断后可直接重复执行第 4 步或第 5 步命令。
|
||||
- CLI 会先做 hash 校验并复用已上传分片,不需要手工删除本地文件。
|
||||
|
||||
## 7. 发布后检查
|
||||
|
||||
仓库地址:
|
||||
|
||||
`https://www.modelscope.cn/models/jiaoyuan/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF`
|
||||
|
||||
检查点:
|
||||
|
||||
- 文件是否完整显示(3 个 GGUF + 1 个 imatrix + 元数据)
|
||||
- README 是否正确展示任务和许可
|
||||
- 页面是否脱离预发布状态(若仍预发布,可补充说明后再申诉)
|
||||
231
docs/QWEN35_QUANTIZATION_MANUAL.md
Normal file
231
docs/QWEN35_QUANTIZATION_MANUAL.md
Normal file
@@ -0,0 +1,231 @@
|
||||
# Qwen3.5-27B 量化操作手册(ik_llama.cpp Docker 版)
|
||||
|
||||
## 1. 目标与范围
|
||||
|
||||
本手册用于在目录 `/home/zly/project/modelscope_qwen35_27b_quantized` 中,使用 `ik_llama.cpp` 对 Qwen3.5-27B BF16 GGUF 进行 imatrix 计算与量化,产出:
|
||||
|
||||
- `Qwen3.5-27B.imatrix.dat`
|
||||
- `Qwen3.5-27B-IQ4_KS.gguf`
|
||||
- `Qwen3.5-27B-IQ5_K.gguf`
|
||||
- `Qwen3.5-27B-IQ6_K.gguf`
|
||||
|
||||
镜像:`hotwa/ik:latest`
|
||||
核心工具:`/llama-imatrix`、`/llama-quantize`
|
||||
|
||||
---
|
||||
|
||||
## 2. 前置条件
|
||||
|
||||
- Docker 可用并有权限访问 daemon
|
||||
- NVIDIA GPU 可用(推荐)
|
||||
- 当前目录存在 BF16 输入文件:
|
||||
- `Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf`
|
||||
- 当前目录存在 Python 环境和脚本:
|
||||
- `./.venv/bin/python`
|
||||
- `prepare_calib_data.py`
|
||||
|
||||
检查命令:
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
|
||||
docker run --rm --gpus all --entrypoint sh hotwa/ik:latest -c "ls -la /llama-imatrix /llama-quantize"
|
||||
ls -lh Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. 准备校准数据
|
||||
|
||||
### 3.1 下载基础校准文件(1152 blocks 来源)
|
||||
|
||||
推荐(社区常用版本):
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
wget -O calibration_data_v5_rc.txt \
|
||||
"https://gist.githubusercontent.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c/raw/571fda718462de863e5a0171078c175420c7649a/calibration_data_v5_rc.txt"
|
||||
```
|
||||
|
||||
官方备用源(网络可达时):
|
||||
|
||||
```bash
|
||||
wget -O calibration_data_v5_rc.txt \
|
||||
"https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/calibration/calibration_data.txt"
|
||||
```
|
||||
|
||||
### 3.2 生成混合校准集
|
||||
|
||||
脚本目标组成(严格):
|
||||
|
||||
- 基础数据:1152 blocks(`calibration_data_v5_rc.txt`)
|
||||
- 代码对话:2000 blocks(`QuixiAI/Code-74k-ShareGPT-Vicuna`)
|
||||
- 代码偏好:1000 blocks(`alvarobartt/openhermes-preferences-coding`)
|
||||
|
||||
执行:
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
./.venv/bin/python prepare_calib_data.py --force-refresh
|
||||
```
|
||||
|
||||
### 3.3 校验 block 数
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
|
||||
./.venv/bin/python - <<'PY'
|
||||
import re
|
||||
from pathlib import Path
|
||||
|
||||
def count_blocks(path):
|
||||
txt = Path(path).read_text(encoding="utf-8", errors="ignore")
|
||||
return len([b for b in re.split(r"\n\s*\n", txt) if b.strip()])
|
||||
|
||||
print("base =", count_blocks("calibration_data_v5_rc.txt"))
|
||||
print("mix =", count_blocks("calibration_data_v5_rc_code.txt"))
|
||||
PY
|
||||
```
|
||||
|
||||
期望:
|
||||
|
||||
- `base = 1152`
|
||||
- `mix = 4152`(1152 + 2000 + 1000)
|
||||
|
||||
---
|
||||
|
||||
## 4. 生成 imatrix
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
|
||||
docker run --gpus all --rm \
|
||||
--entrypoint sh \
|
||||
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
|
||||
-v /home/zly/project/modelscope_qwen35_27b_quantized/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \
|
||||
hotwa/ik:latest \
|
||||
-c "/llama-imatrix \
|
||||
-m /workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
|
||||
-f /workspace/calib_data.txt \
|
||||
-o /workspace/models/Qwen3.5-27B.imatrix.dat \
|
||||
--ctx-size 512 \
|
||||
-ngl 99 \
|
||||
--threads 16"
|
||||
```
|
||||
|
||||
完成校验:
|
||||
|
||||
```bash
|
||||
ls -lh Qwen3.5-27B.imatrix.dat
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. 量化三种格式
|
||||
|
||||
### 5.1 IQ4_KS
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
|
||||
docker run --gpus all --rm \
|
||||
--entrypoint sh \
|
||||
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
|
||||
hotwa/ik:latest \
|
||||
-c "/llama-quantize \
|
||||
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
|
||||
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
|
||||
/workspace/models/Qwen3.5-27B-IQ4_KS.gguf \
|
||||
IQ4_KS"
|
||||
```
|
||||
|
||||
### 5.2 IQ5_K
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
|
||||
docker run --gpus all --rm \
|
||||
--entrypoint sh \
|
||||
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
|
||||
hotwa/ik:latest \
|
||||
-c "/llama-quantize \
|
||||
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
|
||||
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
|
||||
/workspace/models/Qwen3.5-27B-IQ5_K.gguf \
|
||||
IQ5_K"
|
||||
```
|
||||
|
||||
### 5.3 IQ6_K
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
|
||||
docker run --gpus all --rm \
|
||||
--entrypoint sh \
|
||||
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
|
||||
hotwa/ik:latest \
|
||||
-c "/llama-quantize \
|
||||
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
|
||||
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
|
||||
/workspace/models/Qwen3.5-27B-IQ6_K.gguf \
|
||||
IQ6_K"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. 一次性校验结果
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
ls -lh Qwen3.5-27B.imatrix.dat Qwen3.5-27B-IQ4_KS.gguf Qwen3.5-27B-IQ5_K.gguf Qwen3.5-27B-IQ6_K.gguf
|
||||
```
|
||||
|
||||
本次实测(2026-03-02):
|
||||
|
||||
- `Qwen3.5-27B.imatrix.dat` = `13,582,647` bytes(约 `12.95 MB`)
|
||||
- `Qwen3.5-27B-IQ4_KS.gguf` = `14,705,833,248` bytes(约 `13.70 GB`)
|
||||
- `Qwen3.5-27B-IQ5_K.gguf` = `18,679,612,704` bytes(约 `17.40 GB`)
|
||||
- `Qwen3.5-27B-IQ6_K.gguf` = `22,292,632,864` bytes(约 `20.76 GB`)
|
||||
|
||||
---
|
||||
|
||||
## 7. 常见问题
|
||||
|
||||
### 7.1 `docker.sock` 权限错误
|
||||
|
||||
现象:`permission denied while trying to connect to the Docker daemon socket`
|
||||
|
||||
处理:
|
||||
|
||||
- 使用具备 Docker 权限的用户执行
|
||||
- 或检查 `docker` 用户组配置
|
||||
|
||||
### 7.2 下载源 DNS 失败
|
||||
|
||||
现象:`unable to resolve host address`
|
||||
|
||||
处理:
|
||||
|
||||
- 优先使用 gist 源(见 3.1)
|
||||
- 或配置可用代理后重试
|
||||
|
||||
### 7.3 输出文件属主为 `root`
|
||||
|
||||
容器写文件可能生成 root 属主。按需修正:
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
sudo chown -R $(id -u):$(id -g) Qwen3.5-27B*.gguf Qwen3.5-27B.imatrix.dat
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. ModelScope 发布最小清单(可选)
|
||||
|
||||
- `README.md`(>= 200 字,包含任务和许可信息)
|
||||
- `configuration.json`(包含 `framework`、`task`、`model.type`)
|
||||
- `.gitattributes`(`*.gguf` 走 LFS)
|
||||
- 量化文件:
|
||||
- `Qwen3.5-27B-IQ4_KS.gguf`
|
||||
- `Qwen3.5-27B-IQ5_K.gguf`
|
||||
- `Qwen3.5-27B-IQ6_K.gguf`
|
||||
Reference in New Issue
Block a user