chore: bootstrap reusable quantization template workspace

This commit is contained in:
2026-03-02 23:07:48 +08:00
commit 1c5822d16b
15 changed files with 167197 additions and 0 deletions

30
.gitignore vendored Normal file
View File

@@ -0,0 +1,30 @@
# Python / env
.venv/
__pycache__/
*.pyc
*.pyo
*.pyd
# Local scratch
.trash/
*.log
# Model weights and large artifacts
*.gguf
*.safetensors
*.safetensors.index.json
*.bin
*.pt
*.pth
*.ckpt
*.onnx
# Hugging Face / cache-like folders
.cache/
# Keep publish metadata, ignore heavy files in publish folder
modelscope_upload/*.gguf
# OS
.DS_Store
Thumbs.db

27
AGENTS.md Normal file
View File

@@ -0,0 +1,27 @@
# AGENTS Guidelines
本文件用于指导后续自动化代理在本仓库中的行为。
## 目标
维护可复用的量化与发布流程,不把大模型权重纳入 Git 版本库。
## 目录约定
- 文档放在 `docs/`
- 脚本放在 `scripts/`
- 校准数据放在 `calibration/`
- 发布目录使用 `modelscope_upload/`
## 必须遵守
1. 不提交任何权重文件(`.gguf``.safetensors``.bin``.pt` 等)
2. 不提交密钥、token、凭据
3. 变更流程时必须同步更新 `docs/` 文档
4. 优先复用现有脚本,不重复造轮子
## 执行习惯
- 对长耗时任务(转换/量化/上传)先给出检查命令和预估耗时
- 上传命令默认提供两版:直连(无代理)与代理版
- 任何会移动大文件的操作先检查磁盘空间

37
README.md Normal file
View File

@@ -0,0 +1,37 @@
# Qwen3.5-27B Quantization Workspace
这个仓库用于沉淀 Qwen3.5-27B 系列模型的可复用量化流程与发布脚本,重点保存:
- 量化流程文档
- 校准数据与数据构建脚本
- ModelScope 发布模板文件与上传脚本
不在仓库中托管权重文件(`.gguf` 等大文件已在 `.gitignore` 中忽略)。
## 目录结构
- `docs/`
- `QWEN35_QUANTIZATION_MANUAL.md`
- `MODELSCOPE_UPLOAD_SOP.md`
- `scripts/`
- `prepare_calib_data.py`
- `upload_to_modelscope.sh`
- `calibration/`
- `calibration_data_v5_rc.txt`
- `calibration_data_v5_rc_code.txt`
- `sources/`
- `modelscope_upload/`
- 面向 ModelScope 的发布目录README/configuration/.gitattributes 与产物)
## 典型工作流
1. 准备/更新校准数据(`scripts/prepare_calib_data.py`
2. 使用 Docker 进行 imatrix 与量化(见 `docs/QWEN35_QUANTIZATION_MANUAL.md`
3. 组织发布目录(`modelscope_upload/`
4. 手动执行上传(见 `docs/MODELSCOPE_UPLOAD_SOP.md``scripts/upload_to_modelscope.sh`
## Git 建议
- 只提交脚本、文档、配置和小体积数据
- 不提交 token、权重、环境目录
- 每次流程调整同步更新 `docs/``AGENTS.md`

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,94 @@
# ModelScope 上传 SOP当前项目
## 1. 目录与文件
工作目录:
`/home/zly/project/modelscope_qwen35_27b_quantized`
上传目录:
`/home/zly/project/modelscope_qwen35_27b_quantized/modelscope_upload`
上传目录应包含:
- `README.md`(超过 200 字,含 `tasks``license`
- `configuration.json`
- `.gitattributes`
- `Qwen3.5-27B-IQ4_KS.gguf`
- `Qwen3.5-27B-IQ5_K.gguf`
- `Qwen3.5-27B-IQ6_K.gguf`
- `Qwen3.5-27B.imatrix.dat`
快速检查:
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
ls -lah modelscope_upload
wc -m modelscope_upload/README.md
```
## 2. 环境准备
使用本地虚拟环境:
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
./.venv/bin/python -V
./.venv/bin/modelscope --version
```
如果未安装:
```bash
./.venv/bin/pip install -U modelscope "setuptools<81"
```
## 3. 登录 ModelScope
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
./.venv/bin/modelscope login --token "<YOUR_MODELSCOPE_TOKEN>"
```
## 4. 上传(推荐:直连无代理)
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
env -u HTTP_PROXY -u HTTPS_PROXY -u ALL_PROXY -u NO_PROXY \
./.venv/bin/modelscope upload \
"jiaoyuan/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF" \
"./modelscope_upload" \
. \
--repo-type model \
--commit-message "Upload Qwen3.5-27B quantized GGUF weights"
```
## 5. 上传(如需走代理)
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
./.venv/bin/modelscope upload \
"jiaoyuan/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF" \
"./modelscope_upload" \
. \
--repo-type model \
--commit-message "Upload Qwen3.5-27B quantized GGUF weights"
```
## 6. 断点/重传说明
- 上传中断后可直接重复执行第 4 步或第 5 步命令。
- CLI 会先做 hash 校验并复用已上传分片,不需要手工删除本地文件。
## 7. 发布后检查
仓库地址:
`https://www.modelscope.cn/models/jiaoyuan/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF`
检查点:
- 文件是否完整显示3 个 GGUF + 1 个 imatrix + 元数据)
- README 是否正确展示任务和许可
- 页面是否脱离预发布状态(若仍预发布,可补充说明后再申诉)

View File

@@ -0,0 +1,231 @@
# Qwen3.5-27B 量化操作手册ik_llama.cpp Docker 版)
## 1. 目标与范围
本手册用于在目录 `/home/zly/project/modelscope_qwen35_27b_quantized` 中,使用 `ik_llama.cpp` 对 Qwen3.5-27B BF16 GGUF 进行 imatrix 计算与量化,产出:
- `Qwen3.5-27B.imatrix.dat`
- `Qwen3.5-27B-IQ4_KS.gguf`
- `Qwen3.5-27B-IQ5_K.gguf`
- `Qwen3.5-27B-IQ6_K.gguf`
镜像:`hotwa/ik:latest`
核心工具:`/llama-imatrix``/llama-quantize`
---
## 2. 前置条件
- Docker 可用并有权限访问 daemon
- NVIDIA GPU 可用(推荐)
- 当前目录存在 BF16 输入文件:
- `Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf`
- 当前目录存在 Python 环境和脚本:
- `./.venv/bin/python`
- `prepare_calib_data.py`
检查命令:
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
docker run --rm --gpus all --entrypoint sh hotwa/ik:latest -c "ls -la /llama-imatrix /llama-quantize"
ls -lh Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf
```
---
## 3. 准备校准数据
### 3.1 下载基础校准文件1152 blocks 来源)
推荐(社区常用版本):
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
wget -O calibration_data_v5_rc.txt \
"https://gist.githubusercontent.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c/raw/571fda718462de863e5a0171078c175420c7649a/calibration_data_v5_rc.txt"
```
官方备用源(网络可达时):
```bash
wget -O calibration_data_v5_rc.txt \
"https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/calibration/calibration_data.txt"
```
### 3.2 生成混合校准集
脚本目标组成(严格):
- 基础数据1152 blocks`calibration_data_v5_rc.txt`
- 代码对话2000 blocks`QuixiAI/Code-74k-ShareGPT-Vicuna`
- 代码偏好1000 blocks`alvarobartt/openhermes-preferences-coding`
执行:
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
./.venv/bin/python prepare_calib_data.py --force-refresh
```
### 3.3 校验 block 数
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
./.venv/bin/python - <<'PY'
import re
from pathlib import Path
def count_blocks(path):
txt = Path(path).read_text(encoding="utf-8", errors="ignore")
return len([b for b in re.split(r"\n\s*\n", txt) if b.strip()])
print("base =", count_blocks("calibration_data_v5_rc.txt"))
print("mix =", count_blocks("calibration_data_v5_rc_code.txt"))
PY
```
期望:
- `base = 1152`
- `mix = 4152`1152 + 2000 + 1000
---
## 4. 生成 imatrix
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
docker run --gpus all --rm \
--entrypoint sh \
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
-v /home/zly/project/modelscope_qwen35_27b_quantized/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \
hotwa/ik:latest \
-c "/llama-imatrix \
-m /workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
-f /workspace/calib_data.txt \
-o /workspace/models/Qwen3.5-27B.imatrix.dat \
--ctx-size 512 \
-ngl 99 \
--threads 16"
```
完成校验:
```bash
ls -lh Qwen3.5-27B.imatrix.dat
```
---
## 5. 量化三种格式
### 5.1 IQ4_KS
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
docker run --gpus all --rm \
--entrypoint sh \
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
hotwa/ik:latest \
-c "/llama-quantize \
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
/workspace/models/Qwen3.5-27B-IQ4_KS.gguf \
IQ4_KS"
```
### 5.2 IQ5_K
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
docker run --gpus all --rm \
--entrypoint sh \
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
hotwa/ik:latest \
-c "/llama-quantize \
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
/workspace/models/Qwen3.5-27B-IQ5_K.gguf \
IQ5_K"
```
### 5.3 IQ6_K
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
docker run --gpus all --rm \
--entrypoint sh \
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
hotwa/ik:latest \
-c "/llama-quantize \
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
/workspace/models/Qwen3.5-27B-IQ6_K.gguf \
IQ6_K"
```
---
## 6. 一次性校验结果
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
ls -lh Qwen3.5-27B.imatrix.dat Qwen3.5-27B-IQ4_KS.gguf Qwen3.5-27B-IQ5_K.gguf Qwen3.5-27B-IQ6_K.gguf
```
本次实测2026-03-02
- `Qwen3.5-27B.imatrix.dat` = `13,582,647` bytes`12.95 MB`
- `Qwen3.5-27B-IQ4_KS.gguf` = `14,705,833,248` bytes`13.70 GB`
- `Qwen3.5-27B-IQ5_K.gguf` = `18,679,612,704` bytes`17.40 GB`
- `Qwen3.5-27B-IQ6_K.gguf` = `22,292,632,864` bytes`20.76 GB`
---
## 7. 常见问题
### 7.1 `docker.sock` 权限错误
现象:`permission denied while trying to connect to the Docker daemon socket`
处理:
- 使用具备 Docker 权限的用户执行
- 或检查 `docker` 用户组配置
### 7.2 下载源 DNS 失败
现象:`unable to resolve host address`
处理:
- 优先使用 gist 源(见 3.1
- 或配置可用代理后重试
### 7.3 输出文件属主为 `root`
容器写文件可能生成 root 属主。按需修正:
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
sudo chown -R $(id -u):$(id -g) Qwen3.5-27B*.gguf Qwen3.5-27B.imatrix.dat
```
---
## 8. ModelScope 发布最小清单(可选)
- `README.md`>= 200 字,包含任务和许可信息)
- `configuration.json`(包含 `framework``task``model.type`
- `.gitattributes``*.gguf` 走 LFS
- 量化文件:
- `Qwen3.5-27B-IQ4_KS.gguf`
- `Qwen3.5-27B-IQ5_K.gguf`
- `Qwen3.5-27B-IQ6_K.gguf`

5
modelscope_upload/.gitattributes vendored Normal file
View File

@@ -0,0 +1,5 @@
*.gguf filter=lfs diff=lfs merge=lfs -text
*.dat filter=lfs diff=lfs merge=lfs -text
*.md text eol=lf
*.json text eol=lf
.gitattributes text eol=lf

Binary file not shown.

View File

@@ -0,0 +1,76 @@
---
tags:
- text-generation
- qwen
- qwen35
- gguf
- quantization
tasks:
- text-generation
license: Apache License 2.0
---
# Qwen3.5-27B Quantized GGUF (IQ4_KS / IQ5_K / IQ6_K)
## 模型说明
该仓库提供 Qwen3.5-27B 的 GGUF 量化版本,适配 llama.cpp 生态,包含 IQ4_KS、IQ5_K、IQ6_K 三种规格。权重由 BF16 GGUF 输入文件通过 imatrix 方式量化,重点平衡了体积、推理速度与精度表现,适用于不同显存预算下的文本生成任务。
## 权重来源
- 原始 BF16 GGUF 来源:`TeichAI/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF`
- 本仓库内容为基于该来源进行 imatrix + GGUF 量化后的发布版本IQ4_KS / IQ5_K / IQ6_K
## 量化方法
本仓库采用 `ik_llama.cpp` Docker 镜像(`hotwa/ik:latest`)进行两阶段量化:
1. 先用 `llama-imatrix` 基于校准语料计算 importance matrix`Qwen3.5-27B.imatrix.dat`
2. 再用 `llama-quantize --imatrix ...` 分别导出 `IQ4_KS``IQ5_K``IQ6_K`
核心量化参数:
- imatrix 输入模型:`Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf`
- `--ctx-size 512`
- `-ngl 99`
- `--threads 16`
该流程使用 imatrix 对不同权重的重要性进行建模,可在同等量化位宽下减少关键层信息损失,提升量化后推理稳定性。
## 校准数据来源与选择依据
量化校准文件为 `calibration_data_v5_rc_code.txt`,总计 `4152` blocks构成如下
- `1152` blocks基础校准数据 `calibration_data_v5_rc.txt`
- `2000` blocks`QuixiAI/Code-74k-ShareGPT-Vicuna`
- `1000` blocks`alvarobartt/openhermes-preferences-coding``chosen` 分支)
基础校准数据下载源:
- 社区常用版本:`https://gist.githubusercontent.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c/raw/571fda718462de863e5a0171078c175420c7649a/calibration_data_v5_rc.txt`
- 官方备用源:`https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/calibration/calibration_data.txt`
选择这三部分数据的目的:
- 基础数据用于覆盖通用语义与常见文本分布,避免模型只对代码域过拟合
- Code-74k 对话样本提升代码生成、调试、解释等场景的量化保真度
- OpenHermes coding preference 样本提供“更优回答偏好”信号,帮助保持代码输出的结构化与可读性
该组合在“通用文本 + 代码任务”之间做了平衡,适合 Qwen3.5-27B Distill 模型的实际使用场景。
## 文件内容
- `Qwen3.5-27B-IQ4_KS.gguf`:低显存优先
- `Qwen3.5-27B-IQ5_K.gguf`:性能和质量平衡
- `Qwen3.5-27B-IQ6_K.gguf`:更高保真优先
- `Qwen3.5-27B.imatrix.dat`:量化使用的 importance matrix
## 使用建议
- 设备资源紧张时优先 IQ4_KS
- 通用推理场景优先 IQ5_K
- 对质量要求更高时使用 IQ6_K
## 备注
该仓库用于发布可直接推理的 GGUF 权重,不包含训练过程文件。推理时请使用支持 GGUF 的推理框架(如 llama.cpp 相关实现)。

View File

@@ -0,0 +1,10 @@
{
"framework": "ggml",
"task": "text-generation",
"model": {
"type": "qwen35"
},
"pipeline": {
"type": "text-generation"
}
}

View File

@@ -0,0 +1,187 @@
#!/usr/bin/env python3
"""
Prepare calibration_data_v5_rc_code.txt with exact composition:
- base: 1152 blocks from calibration_data_v5_rc.txt
- code: 2000 blocks from QuixiAI/Code-74k-ShareGPT-Vicuna
- pref: 1000 blocks from alvarobartt/openhermes-preferences-coding
"""
from __future__ import annotations
import argparse
import random
import re
import subprocess
import sys
from pathlib import Path
from datasets import load_dataset
BASE_URL = (
"https://raw.githubusercontent.com/ggerganov/llama.cpp/master/"
"examples/calibration/calibration_data.txt"
)
BLOCK_SPLIT_RE = re.compile(r"\n\s*\n")
def split_blocks(text: str) -> list[str]:
blocks = [b.strip() for b in BLOCK_SPLIT_RE.split(text) if b.strip()]
return blocks
def read_blocks(path: Path) -> list[str]:
return split_blocks(path.read_text(encoding="utf-8", errors="ignore"))
def write_blocks(path: Path, blocks: list[str]) -> None:
path.write_text("\n\n".join(blocks).strip() + "\n", encoding="utf-8")
def ensure_base_file(path: Path) -> None:
if path.exists():
return
cmd = ["wget", BASE_URL, "-O", str(path)]
print("Downloading base calibration file:")
print(" ", " ".join(cmd))
subprocess.run(cmd, check=True)
def pick_blocks(blocks: list[str], target: int, seed: int) -> list[str]:
if len(blocks) < target:
raise ValueError(f"Need {target} blocks but only got {len(blocks)}.")
rng = random.Random(seed)
idxs = list(range(len(blocks)))
rng.shuffle(idxs)
return [blocks[i] for i in idxs[:target]]
def build_code74k_blocks(target: int, seed: int) -> list[str]:
ds = load_dataset("QuixiAI/Code-74k-ShareGPT-Vicuna", split="train")
rows = list(range(len(ds)))
rng = random.Random(seed)
rng.shuffle(rows)
out: list[str] = []
for i in rows:
conv = ds[i].get("conversations") or []
parts = []
for msg in conv:
value = (msg.get("value") or "").strip()
if value:
parts.append(value)
if parts:
out.append("\n".join(parts))
if len(out) >= target:
break
if len(out) < target:
raise RuntimeError(
f"Code-74k yielded only {len(out)} valid blocks, target is {target}."
)
return out
def build_openhermes_blocks(target: int, seed: int) -> list[str]:
ds = load_dataset("alvarobartt/openhermes-preferences-coding", split="train")
rows = list(range(len(ds)))
rng = random.Random(seed + 1)
rng.shuffle(rows)
out: list[str] = []
for i in rows:
chosen = ds[i].get("chosen") or []
parts = []
for msg in chosen:
value = (msg.get("content") or "").strip()
if value:
parts.append(value)
if parts:
out.append("\n".join(parts))
if len(out) >= target:
break
if len(out) < target:
raise RuntimeError(
f"OpenHermes yielded only {len(out)} valid blocks, target is {target}."
)
return out
def ensure_cached_blocks(
cache_path: Path,
target: int,
build_fn,
seed: int,
) -> list[str]:
if cache_path.exists():
cached = read_blocks(cache_path)
if len(cached) >= target:
return cached[:target]
print(
f"{cache_path} has {len(cached)} blocks (< {target}), rebuilding from source."
)
blocks = build_fn(target, seed)
cache_path.parent.mkdir(parents=True, exist_ok=True)
write_blocks(cache_path, blocks)
return blocks
def main() -> int:
parser = argparse.ArgumentParser()
parser.add_argument("--seed", type=int, default=42)
parser.add_argument("--base-file", default="calibration_data_v5_rc.txt")
parser.add_argument("--output", default="calibration_data_v5_rc_code.txt")
parser.add_argument("--data-dir", default="data")
parser.add_argument("--force-refresh", action="store_true")
args = parser.parse_args()
base_file = Path(args.base_file)
output_file = Path(args.output)
data_dir = Path(args.data_dir)
code_cache = data_dir / "code74k_2000.txt"
openhermes_cache = data_dir / "openhermes_coding_chosen_1000.txt"
if args.force_refresh:
for p in [code_cache, openhermes_cache]:
if p.exists():
p.unlink()
ensure_base_file(base_file)
base_blocks_all = read_blocks(base_file)
base_blocks = pick_blocks(base_blocks_all, target=1152, seed=args.seed)
code_blocks = ensure_cached_blocks(
cache_path=code_cache,
target=2000,
build_fn=build_code74k_blocks,
seed=args.seed,
)
openhermes_blocks = ensure_cached_blocks(
cache_path=openhermes_cache,
target=1000,
build_fn=build_openhermes_blocks,
seed=args.seed,
)
merged = base_blocks + code_blocks + openhermes_blocks
write_blocks(output_file, merged)
print("Done.")
print(f"base blocks: {len(base_blocks)} ({base_file})")
print(f"code blocks: {len(code_blocks)} (QuixiAI/Code-74k-ShareGPT-Vicuna)")
print(
"openhermes blocks: "
f"{len(openhermes_blocks)} (alvarobartt/openhermes-preferences-coding)"
)
print(f"total blocks: {len(merged)}")
print(f"output: {output_file}")
return 0
if __name__ == "__main__":
try:
raise SystemExit(main())
except subprocess.CalledProcessError as exc:
print(f"Command failed with exit code {exc.returncode}", file=sys.stderr)
raise

25
scripts/upload_to_modelscope.sh Executable file
View File

@@ -0,0 +1,25 @@
#!/usr/bin/env bash
set -euo pipefail
# 用法:
# ./upload_to_modelscope.sh <repo_id> <token>
# 示例:
# ./upload_to_modelscope.sh your_username/your_repo_name ms-xxxxxxxx
REPO_ID="${1:-}"
TOKEN="${2:-}"
if [[ -z "${REPO_ID}" || -z "${TOKEN}" ]]; then
echo "Usage: $0 <repo_id> <token>"
exit 1
fi
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
ROOT_DIR="$(cd "${SCRIPT_DIR}/.." && pwd)"
"${ROOT_DIR}/.venv/bin/modelscope" login --token "${TOKEN}"
"${ROOT_DIR}/.venv/bin/modelscope" upload "${REPO_ID}" "${SCRIPT_DIR}" . \
--repo-type model \
--commit-message "Upload Qwen3.5-27B quantized GGUF weights"
echo "Upload finished: ${REPO_ID}"