chore: bootstrap reusable quantization template workspace
This commit is contained in:
30
.gitignore
vendored
Normal file
30
.gitignore
vendored
Normal file
@@ -0,0 +1,30 @@
|
||||
# Python / env
|
||||
.venv/
|
||||
__pycache__/
|
||||
*.pyc
|
||||
*.pyo
|
||||
*.pyd
|
||||
|
||||
# Local scratch
|
||||
.trash/
|
||||
*.log
|
||||
|
||||
# Model weights and large artifacts
|
||||
*.gguf
|
||||
*.safetensors
|
||||
*.safetensors.index.json
|
||||
*.bin
|
||||
*.pt
|
||||
*.pth
|
||||
*.ckpt
|
||||
*.onnx
|
||||
|
||||
# Hugging Face / cache-like folders
|
||||
.cache/
|
||||
|
||||
# Keep publish metadata, ignore heavy files in publish folder
|
||||
modelscope_upload/*.gguf
|
||||
|
||||
# OS
|
||||
.DS_Store
|
||||
Thumbs.db
|
||||
27
AGENTS.md
Normal file
27
AGENTS.md
Normal file
@@ -0,0 +1,27 @@
|
||||
# AGENTS Guidelines
|
||||
|
||||
本文件用于指导后续自动化代理在本仓库中的行为。
|
||||
|
||||
## 目标
|
||||
|
||||
维护可复用的量化与发布流程,不把大模型权重纳入 Git 版本库。
|
||||
|
||||
## 目录约定
|
||||
|
||||
- 文档放在 `docs/`
|
||||
- 脚本放在 `scripts/`
|
||||
- 校准数据放在 `calibration/`
|
||||
- 发布目录使用 `modelscope_upload/`
|
||||
|
||||
## 必须遵守
|
||||
|
||||
1. 不提交任何权重文件(`.gguf`、`.safetensors`、`.bin`、`.pt` 等)
|
||||
2. 不提交密钥、token、凭据
|
||||
3. 变更流程时必须同步更新 `docs/` 文档
|
||||
4. 优先复用现有脚本,不重复造轮子
|
||||
|
||||
## 执行习惯
|
||||
|
||||
- 对长耗时任务(转换/量化/上传)先给出检查命令和预估耗时
|
||||
- 上传命令默认提供两版:直连(无代理)与代理版
|
||||
- 任何会移动大文件的操作先检查磁盘空间
|
||||
37
README.md
Normal file
37
README.md
Normal file
@@ -0,0 +1,37 @@
|
||||
# Qwen3.5-27B Quantization Workspace
|
||||
|
||||
这个仓库用于沉淀 Qwen3.5-27B 系列模型的可复用量化流程与发布脚本,重点保存:
|
||||
|
||||
- 量化流程文档
|
||||
- 校准数据与数据构建脚本
|
||||
- ModelScope 发布模板文件与上传脚本
|
||||
|
||||
不在仓库中托管权重文件(`.gguf` 等大文件已在 `.gitignore` 中忽略)。
|
||||
|
||||
## 目录结构
|
||||
|
||||
- `docs/`
|
||||
- `QWEN35_QUANTIZATION_MANUAL.md`
|
||||
- `MODELSCOPE_UPLOAD_SOP.md`
|
||||
- `scripts/`
|
||||
- `prepare_calib_data.py`
|
||||
- `upload_to_modelscope.sh`
|
||||
- `calibration/`
|
||||
- `calibration_data_v5_rc.txt`
|
||||
- `calibration_data_v5_rc_code.txt`
|
||||
- `sources/`
|
||||
- `modelscope_upload/`
|
||||
- 面向 ModelScope 的发布目录(README/configuration/.gitattributes 与产物)
|
||||
|
||||
## 典型工作流
|
||||
|
||||
1. 准备/更新校准数据(`scripts/prepare_calib_data.py`)
|
||||
2. 使用 Docker 进行 imatrix 与量化(见 `docs/QWEN35_QUANTIZATION_MANUAL.md`)
|
||||
3. 组织发布目录(`modelscope_upload/`)
|
||||
4. 手动执行上传(见 `docs/MODELSCOPE_UPLOAD_SOP.md` 或 `scripts/upload_to_modelscope.sh`)
|
||||
|
||||
## Git 建议
|
||||
|
||||
- 只提交脚本、文档、配置和小体积数据
|
||||
- 不提交 token、权重、环境目录
|
||||
- 每次流程调整同步更新 `docs/` 与 `AGENTS.md`
|
||||
4802
calibration/calibration_data_v5_rc.txt
Normal file
4802
calibration/calibration_data_v5_rc.txt
Normal file
File diff suppressed because one or more lines are too long
17757
calibration/calibration_data_v5_rc_code.txt
Normal file
17757
calibration/calibration_data_v5_rc_code.txt
Normal file
File diff suppressed because one or more lines are too long
85821
calibration/sources/code74k_2000.txt
Normal file
85821
calibration/sources/code74k_2000.txt
Normal file
File diff suppressed because it is too large
Load Diff
58095
calibration/sources/openhermes_coding_chosen_1000.txt
Normal file
58095
calibration/sources/openhermes_coding_chosen_1000.txt
Normal file
File diff suppressed because it is too large
Load Diff
94
docs/MODELSCOPE_UPLOAD_SOP.md
Normal file
94
docs/MODELSCOPE_UPLOAD_SOP.md
Normal file
@@ -0,0 +1,94 @@
|
||||
# ModelScope 上传 SOP(当前项目)
|
||||
|
||||
## 1. 目录与文件
|
||||
|
||||
工作目录:
|
||||
|
||||
`/home/zly/project/modelscope_qwen35_27b_quantized`
|
||||
|
||||
上传目录:
|
||||
|
||||
`/home/zly/project/modelscope_qwen35_27b_quantized/modelscope_upload`
|
||||
|
||||
上传目录应包含:
|
||||
|
||||
- `README.md`(超过 200 字,含 `tasks` 和 `license`)
|
||||
- `configuration.json`
|
||||
- `.gitattributes`
|
||||
- `Qwen3.5-27B-IQ4_KS.gguf`
|
||||
- `Qwen3.5-27B-IQ5_K.gguf`
|
||||
- `Qwen3.5-27B-IQ6_K.gguf`
|
||||
- `Qwen3.5-27B.imatrix.dat`
|
||||
|
||||
快速检查:
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
ls -lah modelscope_upload
|
||||
wc -m modelscope_upload/README.md
|
||||
```
|
||||
|
||||
## 2. 环境准备
|
||||
|
||||
使用本地虚拟环境:
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
./.venv/bin/python -V
|
||||
./.venv/bin/modelscope --version
|
||||
```
|
||||
|
||||
如果未安装:
|
||||
|
||||
```bash
|
||||
./.venv/bin/pip install -U modelscope "setuptools<81"
|
||||
```
|
||||
|
||||
## 3. 登录 ModelScope
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
./.venv/bin/modelscope login --token "<YOUR_MODELSCOPE_TOKEN>"
|
||||
```
|
||||
|
||||
## 4. 上传(推荐:直连无代理)
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
env -u HTTP_PROXY -u HTTPS_PROXY -u ALL_PROXY -u NO_PROXY \
|
||||
./.venv/bin/modelscope upload \
|
||||
"jiaoyuan/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF" \
|
||||
"./modelscope_upload" \
|
||||
. \
|
||||
--repo-type model \
|
||||
--commit-message "Upload Qwen3.5-27B quantized GGUF weights"
|
||||
```
|
||||
|
||||
## 5. 上传(如需走代理)
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
./.venv/bin/modelscope upload \
|
||||
"jiaoyuan/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF" \
|
||||
"./modelscope_upload" \
|
||||
. \
|
||||
--repo-type model \
|
||||
--commit-message "Upload Qwen3.5-27B quantized GGUF weights"
|
||||
```
|
||||
|
||||
## 6. 断点/重传说明
|
||||
|
||||
- 上传中断后可直接重复执行第 4 步或第 5 步命令。
|
||||
- CLI 会先做 hash 校验并复用已上传分片,不需要手工删除本地文件。
|
||||
|
||||
## 7. 发布后检查
|
||||
|
||||
仓库地址:
|
||||
|
||||
`https://www.modelscope.cn/models/jiaoyuan/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF`
|
||||
|
||||
检查点:
|
||||
|
||||
- 文件是否完整显示(3 个 GGUF + 1 个 imatrix + 元数据)
|
||||
- README 是否正确展示任务和许可
|
||||
- 页面是否脱离预发布状态(若仍预发布,可补充说明后再申诉)
|
||||
231
docs/QWEN35_QUANTIZATION_MANUAL.md
Normal file
231
docs/QWEN35_QUANTIZATION_MANUAL.md
Normal file
@@ -0,0 +1,231 @@
|
||||
# Qwen3.5-27B 量化操作手册(ik_llama.cpp Docker 版)
|
||||
|
||||
## 1. 目标与范围
|
||||
|
||||
本手册用于在目录 `/home/zly/project/modelscope_qwen35_27b_quantized` 中,使用 `ik_llama.cpp` 对 Qwen3.5-27B BF16 GGUF 进行 imatrix 计算与量化,产出:
|
||||
|
||||
- `Qwen3.5-27B.imatrix.dat`
|
||||
- `Qwen3.5-27B-IQ4_KS.gguf`
|
||||
- `Qwen3.5-27B-IQ5_K.gguf`
|
||||
- `Qwen3.5-27B-IQ6_K.gguf`
|
||||
|
||||
镜像:`hotwa/ik:latest`
|
||||
核心工具:`/llama-imatrix`、`/llama-quantize`
|
||||
|
||||
---
|
||||
|
||||
## 2. 前置条件
|
||||
|
||||
- Docker 可用并有权限访问 daemon
|
||||
- NVIDIA GPU 可用(推荐)
|
||||
- 当前目录存在 BF16 输入文件:
|
||||
- `Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf`
|
||||
- 当前目录存在 Python 环境和脚本:
|
||||
- `./.venv/bin/python`
|
||||
- `prepare_calib_data.py`
|
||||
|
||||
检查命令:
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
|
||||
docker run --rm --gpus all --entrypoint sh hotwa/ik:latest -c "ls -la /llama-imatrix /llama-quantize"
|
||||
ls -lh Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. 准备校准数据
|
||||
|
||||
### 3.1 下载基础校准文件(1152 blocks 来源)
|
||||
|
||||
推荐(社区常用版本):
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
wget -O calibration_data_v5_rc.txt \
|
||||
"https://gist.githubusercontent.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c/raw/571fda718462de863e5a0171078c175420c7649a/calibration_data_v5_rc.txt"
|
||||
```
|
||||
|
||||
官方备用源(网络可达时):
|
||||
|
||||
```bash
|
||||
wget -O calibration_data_v5_rc.txt \
|
||||
"https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/calibration/calibration_data.txt"
|
||||
```
|
||||
|
||||
### 3.2 生成混合校准集
|
||||
|
||||
脚本目标组成(严格):
|
||||
|
||||
- 基础数据:1152 blocks(`calibration_data_v5_rc.txt`)
|
||||
- 代码对话:2000 blocks(`QuixiAI/Code-74k-ShareGPT-Vicuna`)
|
||||
- 代码偏好:1000 blocks(`alvarobartt/openhermes-preferences-coding`)
|
||||
|
||||
执行:
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
./.venv/bin/python prepare_calib_data.py --force-refresh
|
||||
```
|
||||
|
||||
### 3.3 校验 block 数
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
|
||||
./.venv/bin/python - <<'PY'
|
||||
import re
|
||||
from pathlib import Path
|
||||
|
||||
def count_blocks(path):
|
||||
txt = Path(path).read_text(encoding="utf-8", errors="ignore")
|
||||
return len([b for b in re.split(r"\n\s*\n", txt) if b.strip()])
|
||||
|
||||
print("base =", count_blocks("calibration_data_v5_rc.txt"))
|
||||
print("mix =", count_blocks("calibration_data_v5_rc_code.txt"))
|
||||
PY
|
||||
```
|
||||
|
||||
期望:
|
||||
|
||||
- `base = 1152`
|
||||
- `mix = 4152`(1152 + 2000 + 1000)
|
||||
|
||||
---
|
||||
|
||||
## 4. 生成 imatrix
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
|
||||
docker run --gpus all --rm \
|
||||
--entrypoint sh \
|
||||
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
|
||||
-v /home/zly/project/modelscope_qwen35_27b_quantized/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \
|
||||
hotwa/ik:latest \
|
||||
-c "/llama-imatrix \
|
||||
-m /workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
|
||||
-f /workspace/calib_data.txt \
|
||||
-o /workspace/models/Qwen3.5-27B.imatrix.dat \
|
||||
--ctx-size 512 \
|
||||
-ngl 99 \
|
||||
--threads 16"
|
||||
```
|
||||
|
||||
完成校验:
|
||||
|
||||
```bash
|
||||
ls -lh Qwen3.5-27B.imatrix.dat
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. 量化三种格式
|
||||
|
||||
### 5.1 IQ4_KS
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
|
||||
docker run --gpus all --rm \
|
||||
--entrypoint sh \
|
||||
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
|
||||
hotwa/ik:latest \
|
||||
-c "/llama-quantize \
|
||||
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
|
||||
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
|
||||
/workspace/models/Qwen3.5-27B-IQ4_KS.gguf \
|
||||
IQ4_KS"
|
||||
```
|
||||
|
||||
### 5.2 IQ5_K
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
|
||||
docker run --gpus all --rm \
|
||||
--entrypoint sh \
|
||||
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
|
||||
hotwa/ik:latest \
|
||||
-c "/llama-quantize \
|
||||
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
|
||||
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
|
||||
/workspace/models/Qwen3.5-27B-IQ5_K.gguf \
|
||||
IQ5_K"
|
||||
```
|
||||
|
||||
### 5.3 IQ6_K
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
|
||||
docker run --gpus all --rm \
|
||||
--entrypoint sh \
|
||||
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
|
||||
hotwa/ik:latest \
|
||||
-c "/llama-quantize \
|
||||
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
|
||||
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
|
||||
/workspace/models/Qwen3.5-27B-IQ6_K.gguf \
|
||||
IQ6_K"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. 一次性校验结果
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
ls -lh Qwen3.5-27B.imatrix.dat Qwen3.5-27B-IQ4_KS.gguf Qwen3.5-27B-IQ5_K.gguf Qwen3.5-27B-IQ6_K.gguf
|
||||
```
|
||||
|
||||
本次实测(2026-03-02):
|
||||
|
||||
- `Qwen3.5-27B.imatrix.dat` = `13,582,647` bytes(约 `12.95 MB`)
|
||||
- `Qwen3.5-27B-IQ4_KS.gguf` = `14,705,833,248` bytes(约 `13.70 GB`)
|
||||
- `Qwen3.5-27B-IQ5_K.gguf` = `18,679,612,704` bytes(约 `17.40 GB`)
|
||||
- `Qwen3.5-27B-IQ6_K.gguf` = `22,292,632,864` bytes(约 `20.76 GB`)
|
||||
|
||||
---
|
||||
|
||||
## 7. 常见问题
|
||||
|
||||
### 7.1 `docker.sock` 权限错误
|
||||
|
||||
现象:`permission denied while trying to connect to the Docker daemon socket`
|
||||
|
||||
处理:
|
||||
|
||||
- 使用具备 Docker 权限的用户执行
|
||||
- 或检查 `docker` 用户组配置
|
||||
|
||||
### 7.2 下载源 DNS 失败
|
||||
|
||||
现象:`unable to resolve host address`
|
||||
|
||||
处理:
|
||||
|
||||
- 优先使用 gist 源(见 3.1)
|
||||
- 或配置可用代理后重试
|
||||
|
||||
### 7.3 输出文件属主为 `root`
|
||||
|
||||
容器写文件可能生成 root 属主。按需修正:
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
sudo chown -R $(id -u):$(id -g) Qwen3.5-27B*.gguf Qwen3.5-27B.imatrix.dat
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. ModelScope 发布最小清单(可选)
|
||||
|
||||
- `README.md`(>= 200 字,包含任务和许可信息)
|
||||
- `configuration.json`(包含 `framework`、`task`、`model.type`)
|
||||
- `.gitattributes`(`*.gguf` 走 LFS)
|
||||
- 量化文件:
|
||||
- `Qwen3.5-27B-IQ4_KS.gguf`
|
||||
- `Qwen3.5-27B-IQ5_K.gguf`
|
||||
- `Qwen3.5-27B-IQ6_K.gguf`
|
||||
5
modelscope_upload/.gitattributes
vendored
Normal file
5
modelscope_upload/.gitattributes
vendored
Normal file
@@ -0,0 +1,5 @@
|
||||
*.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
*.dat filter=lfs diff=lfs merge=lfs -text
|
||||
*.md text eol=lf
|
||||
*.json text eol=lf
|
||||
.gitattributes text eol=lf
|
||||
BIN
modelscope_upload/Qwen3.5-27B.imatrix.dat
Normal file
BIN
modelscope_upload/Qwen3.5-27B.imatrix.dat
Normal file
Binary file not shown.
76
modelscope_upload/README.md
Normal file
76
modelscope_upload/README.md
Normal file
@@ -0,0 +1,76 @@
|
||||
---
|
||||
tags:
|
||||
- text-generation
|
||||
- qwen
|
||||
- qwen35
|
||||
- gguf
|
||||
- quantization
|
||||
tasks:
|
||||
- text-generation
|
||||
license: Apache License 2.0
|
||||
---
|
||||
|
||||
# Qwen3.5-27B Quantized GGUF (IQ4_KS / IQ5_K / IQ6_K)
|
||||
|
||||
## 模型说明
|
||||
|
||||
该仓库提供 Qwen3.5-27B 的 GGUF 量化版本,适配 llama.cpp 生态,包含 IQ4_KS、IQ5_K、IQ6_K 三种规格。权重由 BF16 GGUF 输入文件通过 imatrix 方式量化,重点平衡了体积、推理速度与精度表现,适用于不同显存预算下的文本生成任务。
|
||||
|
||||
## 权重来源
|
||||
|
||||
- 原始 BF16 GGUF 来源:`TeichAI/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF`
|
||||
- 本仓库内容为基于该来源进行 imatrix + GGUF 量化后的发布版本(IQ4_KS / IQ5_K / IQ6_K)
|
||||
|
||||
## 量化方法
|
||||
|
||||
本仓库采用 `ik_llama.cpp` Docker 镜像(`hotwa/ik:latest`)进行两阶段量化:
|
||||
|
||||
1. 先用 `llama-imatrix` 基于校准语料计算 importance matrix(`Qwen3.5-27B.imatrix.dat`)
|
||||
2. 再用 `llama-quantize --imatrix ...` 分别导出 `IQ4_KS`、`IQ5_K`、`IQ6_K`
|
||||
|
||||
核心量化参数:
|
||||
|
||||
- imatrix 输入模型:`Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf`
|
||||
- `--ctx-size 512`
|
||||
- `-ngl 99`
|
||||
- `--threads 16`
|
||||
|
||||
该流程使用 imatrix 对不同权重的重要性进行建模,可在同等量化位宽下减少关键层信息损失,提升量化后推理稳定性。
|
||||
|
||||
## 校准数据来源与选择依据
|
||||
|
||||
量化校准文件为 `calibration_data_v5_rc_code.txt`,总计 `4152` blocks,构成如下:
|
||||
|
||||
- `1152` blocks:基础校准数据 `calibration_data_v5_rc.txt`
|
||||
- `2000` blocks:`QuixiAI/Code-74k-ShareGPT-Vicuna`
|
||||
- `1000` blocks:`alvarobartt/openhermes-preferences-coding`(`chosen` 分支)
|
||||
|
||||
基础校准数据下载源:
|
||||
|
||||
- 社区常用版本:`https://gist.githubusercontent.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c/raw/571fda718462de863e5a0171078c175420c7649a/calibration_data_v5_rc.txt`
|
||||
- 官方备用源:`https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/calibration/calibration_data.txt`
|
||||
|
||||
选择这三部分数据的目的:
|
||||
|
||||
- 基础数据用于覆盖通用语义与常见文本分布,避免模型只对代码域过拟合
|
||||
- Code-74k 对话样本提升代码生成、调试、解释等场景的量化保真度
|
||||
- OpenHermes coding preference 样本提供“更优回答偏好”信号,帮助保持代码输出的结构化与可读性
|
||||
|
||||
该组合在“通用文本 + 代码任务”之间做了平衡,适合 Qwen3.5-27B Distill 模型的实际使用场景。
|
||||
|
||||
## 文件内容
|
||||
|
||||
- `Qwen3.5-27B-IQ4_KS.gguf`:低显存优先
|
||||
- `Qwen3.5-27B-IQ5_K.gguf`:性能和质量平衡
|
||||
- `Qwen3.5-27B-IQ6_K.gguf`:更高保真优先
|
||||
- `Qwen3.5-27B.imatrix.dat`:量化使用的 importance matrix
|
||||
|
||||
## 使用建议
|
||||
|
||||
- 设备资源紧张时优先 IQ4_KS
|
||||
- 通用推理场景优先 IQ5_K
|
||||
- 对质量要求更高时使用 IQ6_K
|
||||
|
||||
## 备注
|
||||
|
||||
该仓库用于发布可直接推理的 GGUF 权重,不包含训练过程文件。推理时请使用支持 GGUF 的推理框架(如 llama.cpp 相关实现)。
|
||||
10
modelscope_upload/configuration.json
Normal file
10
modelscope_upload/configuration.json
Normal file
@@ -0,0 +1,10 @@
|
||||
{
|
||||
"framework": "ggml",
|
||||
"task": "text-generation",
|
||||
"model": {
|
||||
"type": "qwen35"
|
||||
},
|
||||
"pipeline": {
|
||||
"type": "text-generation"
|
||||
}
|
||||
}
|
||||
187
scripts/prepare_calib_data.py
Normal file
187
scripts/prepare_calib_data.py
Normal file
@@ -0,0 +1,187 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Prepare calibration_data_v5_rc_code.txt with exact composition:
|
||||
- base: 1152 blocks from calibration_data_v5_rc.txt
|
||||
- code: 2000 blocks from QuixiAI/Code-74k-ShareGPT-Vicuna
|
||||
- pref: 1000 blocks from alvarobartt/openhermes-preferences-coding
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import random
|
||||
import re
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
from datasets import load_dataset
|
||||
|
||||
BASE_URL = (
|
||||
"https://raw.githubusercontent.com/ggerganov/llama.cpp/master/"
|
||||
"examples/calibration/calibration_data.txt"
|
||||
)
|
||||
BLOCK_SPLIT_RE = re.compile(r"\n\s*\n")
|
||||
|
||||
|
||||
def split_blocks(text: str) -> list[str]:
|
||||
blocks = [b.strip() for b in BLOCK_SPLIT_RE.split(text) if b.strip()]
|
||||
return blocks
|
||||
|
||||
|
||||
def read_blocks(path: Path) -> list[str]:
|
||||
return split_blocks(path.read_text(encoding="utf-8", errors="ignore"))
|
||||
|
||||
|
||||
def write_blocks(path: Path, blocks: list[str]) -> None:
|
||||
path.write_text("\n\n".join(blocks).strip() + "\n", encoding="utf-8")
|
||||
|
||||
|
||||
def ensure_base_file(path: Path) -> None:
|
||||
if path.exists():
|
||||
return
|
||||
cmd = ["wget", BASE_URL, "-O", str(path)]
|
||||
print("Downloading base calibration file:")
|
||||
print(" ", " ".join(cmd))
|
||||
subprocess.run(cmd, check=True)
|
||||
|
||||
|
||||
def pick_blocks(blocks: list[str], target: int, seed: int) -> list[str]:
|
||||
if len(blocks) < target:
|
||||
raise ValueError(f"Need {target} blocks but only got {len(blocks)}.")
|
||||
rng = random.Random(seed)
|
||||
idxs = list(range(len(blocks)))
|
||||
rng.shuffle(idxs)
|
||||
return [blocks[i] for i in idxs[:target]]
|
||||
|
||||
|
||||
def build_code74k_blocks(target: int, seed: int) -> list[str]:
|
||||
ds = load_dataset("QuixiAI/Code-74k-ShareGPT-Vicuna", split="train")
|
||||
rows = list(range(len(ds)))
|
||||
rng = random.Random(seed)
|
||||
rng.shuffle(rows)
|
||||
|
||||
out: list[str] = []
|
||||
for i in rows:
|
||||
conv = ds[i].get("conversations") or []
|
||||
parts = []
|
||||
for msg in conv:
|
||||
value = (msg.get("value") or "").strip()
|
||||
if value:
|
||||
parts.append(value)
|
||||
if parts:
|
||||
out.append("\n".join(parts))
|
||||
if len(out) >= target:
|
||||
break
|
||||
|
||||
if len(out) < target:
|
||||
raise RuntimeError(
|
||||
f"Code-74k yielded only {len(out)} valid blocks, target is {target}."
|
||||
)
|
||||
return out
|
||||
|
||||
|
||||
def build_openhermes_blocks(target: int, seed: int) -> list[str]:
|
||||
ds = load_dataset("alvarobartt/openhermes-preferences-coding", split="train")
|
||||
rows = list(range(len(ds)))
|
||||
rng = random.Random(seed + 1)
|
||||
rng.shuffle(rows)
|
||||
|
||||
out: list[str] = []
|
||||
for i in rows:
|
||||
chosen = ds[i].get("chosen") or []
|
||||
parts = []
|
||||
for msg in chosen:
|
||||
value = (msg.get("content") or "").strip()
|
||||
if value:
|
||||
parts.append(value)
|
||||
if parts:
|
||||
out.append("\n".join(parts))
|
||||
if len(out) >= target:
|
||||
break
|
||||
|
||||
if len(out) < target:
|
||||
raise RuntimeError(
|
||||
f"OpenHermes yielded only {len(out)} valid blocks, target is {target}."
|
||||
)
|
||||
return out
|
||||
|
||||
|
||||
def ensure_cached_blocks(
|
||||
cache_path: Path,
|
||||
target: int,
|
||||
build_fn,
|
||||
seed: int,
|
||||
) -> list[str]:
|
||||
if cache_path.exists():
|
||||
cached = read_blocks(cache_path)
|
||||
if len(cached) >= target:
|
||||
return cached[:target]
|
||||
print(
|
||||
f"{cache_path} has {len(cached)} blocks (< {target}), rebuilding from source."
|
||||
)
|
||||
|
||||
blocks = build_fn(target, seed)
|
||||
cache_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
write_blocks(cache_path, blocks)
|
||||
return blocks
|
||||
|
||||
|
||||
def main() -> int:
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("--seed", type=int, default=42)
|
||||
parser.add_argument("--base-file", default="calibration_data_v5_rc.txt")
|
||||
parser.add_argument("--output", default="calibration_data_v5_rc_code.txt")
|
||||
parser.add_argument("--data-dir", default="data")
|
||||
parser.add_argument("--force-refresh", action="store_true")
|
||||
args = parser.parse_args()
|
||||
|
||||
base_file = Path(args.base_file)
|
||||
output_file = Path(args.output)
|
||||
data_dir = Path(args.data_dir)
|
||||
code_cache = data_dir / "code74k_2000.txt"
|
||||
openhermes_cache = data_dir / "openhermes_coding_chosen_1000.txt"
|
||||
|
||||
if args.force_refresh:
|
||||
for p in [code_cache, openhermes_cache]:
|
||||
if p.exists():
|
||||
p.unlink()
|
||||
|
||||
ensure_base_file(base_file)
|
||||
base_blocks_all = read_blocks(base_file)
|
||||
base_blocks = pick_blocks(base_blocks_all, target=1152, seed=args.seed)
|
||||
|
||||
code_blocks = ensure_cached_blocks(
|
||||
cache_path=code_cache,
|
||||
target=2000,
|
||||
build_fn=build_code74k_blocks,
|
||||
seed=args.seed,
|
||||
)
|
||||
openhermes_blocks = ensure_cached_blocks(
|
||||
cache_path=openhermes_cache,
|
||||
target=1000,
|
||||
build_fn=build_openhermes_blocks,
|
||||
seed=args.seed,
|
||||
)
|
||||
|
||||
merged = base_blocks + code_blocks + openhermes_blocks
|
||||
write_blocks(output_file, merged)
|
||||
|
||||
print("Done.")
|
||||
print(f"base blocks: {len(base_blocks)} ({base_file})")
|
||||
print(f"code blocks: {len(code_blocks)} (QuixiAI/Code-74k-ShareGPT-Vicuna)")
|
||||
print(
|
||||
"openhermes blocks: "
|
||||
f"{len(openhermes_blocks)} (alvarobartt/openhermes-preferences-coding)"
|
||||
)
|
||||
print(f"total blocks: {len(merged)}")
|
||||
print(f"output: {output_file}")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
try:
|
||||
raise SystemExit(main())
|
||||
except subprocess.CalledProcessError as exc:
|
||||
print(f"Command failed with exit code {exc.returncode}", file=sys.stderr)
|
||||
raise
|
||||
25
scripts/upload_to_modelscope.sh
Executable file
25
scripts/upload_to_modelscope.sh
Executable file
@@ -0,0 +1,25 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
# 用法:
|
||||
# ./upload_to_modelscope.sh <repo_id> <token>
|
||||
# 示例:
|
||||
# ./upload_to_modelscope.sh your_username/your_repo_name ms-xxxxxxxx
|
||||
|
||||
REPO_ID="${1:-}"
|
||||
TOKEN="${2:-}"
|
||||
|
||||
if [[ -z "${REPO_ID}" || -z "${TOKEN}" ]]; then
|
||||
echo "Usage: $0 <repo_id> <token>"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
ROOT_DIR="$(cd "${SCRIPT_DIR}/.." && pwd)"
|
||||
|
||||
"${ROOT_DIR}/.venv/bin/modelscope" login --token "${TOKEN}"
|
||||
"${ROOT_DIR}/.venv/bin/modelscope" upload "${REPO_ID}" "${SCRIPT_DIR}" . \
|
||||
--repo-type model \
|
||||
--commit-message "Upload Qwen3.5-27B quantized GGUF weights"
|
||||
|
||||
echo "Upload finished: ${REPO_ID}"
|
||||
Reference in New Issue
Block a user