first commit
This commit is contained in:
25
.gitignore
vendored
25
.gitignore
vendored
@@ -5,11 +5,12 @@ __pycache__/
|
||||
*.pyo
|
||||
*.pyd
|
||||
|
||||
# Local scratch
|
||||
# Caches / temp
|
||||
.cache/
|
||||
.trash/
|
||||
*.log
|
||||
|
||||
# Model weights and large artifacts
|
||||
# Large model files
|
||||
*.gguf
|
||||
*.safetensors
|
||||
*.safetensors.index.json
|
||||
@@ -18,12 +19,26 @@ __pycache__/
|
||||
*.pth
|
||||
*.ckpt
|
||||
*.onnx
|
||||
*.tar
|
||||
*.tar.gz
|
||||
|
||||
# Hugging Face / cache-like folders
|
||||
.cache/
|
||||
# Root artifacts workspace
|
||||
/artifacts/
|
||||
|
||||
# Keep publish metadata, ignore heavy files in publish folder
|
||||
# Example artifacts: keep directory skeleton only
|
||||
examples/*/artifacts/*
|
||||
!examples/*/artifacts/.gitkeep
|
||||
|
||||
# Keep modelscope metadata, ignore heavy artifacts in publish dir
|
||||
modelscope_upload/*.gguf
|
||||
modelscope_upload/*.safetensors
|
||||
modelscope_upload/*.bin
|
||||
modelscope_upload/*.pt
|
||||
modelscope_upload/*.dat
|
||||
|
||||
# Tools
|
||||
.pixi/
|
||||
ik_llama.cpp/
|
||||
|
||||
# OS
|
||||
.DS_Store
|
||||
|
||||
114
AGENTS.md
114
AGENTS.md
@@ -1,27 +1,105 @@
|
||||
# AGENTS Guidelines
|
||||
# AGENTS
|
||||
|
||||
本文件用于指导后续自动化代理在本仓库中的行为。
|
||||
## Purpose
|
||||
|
||||
## 目标
|
||||
本仓库用于复用 LLM 量化发布流程:HF safetensors -> BF16 GGUF -> imatrix -> IQ4/IQ5/IQ6 -> ModelScope 上传目录。
|
||||
|
||||
维护可复用的量化与发布流程,不把大模型权重纳入 Git 版本库。
|
||||
## Repository Contract
|
||||
|
||||
## 目录约定
|
||||
- 模板文档:`docs/`
|
||||
- 脚本:`scripts/`
|
||||
- 模板文件:`templates/`
|
||||
- 案例:`examples/<model_name>/`
|
||||
- 上传工作区:`modelscope_upload/`
|
||||
- 大产物:`artifacts/`(忽略)
|
||||
|
||||
- 文档放在 `docs/`
|
||||
- 脚本放在 `scripts/`
|
||||
- 校准数据放在 `calibration/`
|
||||
- 发布目录使用 `modelscope_upload/`
|
||||
## Hard Rules
|
||||
|
||||
## 必须遵守
|
||||
1. 禁止提交任何权重文件(`*.gguf`, `*.safetensors`, `*.bin`, `*.pt` 等)。
|
||||
2. 禁止提交 token、密钥、账号凭据。
|
||||
3. 新模型必须新增 `examples/<model_name>/README.md` 记录关键参数。
|
||||
4. 任何脚本或流程变更,必须同步更新 `docs/`。
|
||||
|
||||
1. 不提交任何权重文件(`.gguf`、`.safetensors`、`.bin`、`.pt` 等)
|
||||
2. 不提交密钥、token、凭据
|
||||
3. 变更流程时必须同步更新 `docs/` 文档
|
||||
4. 优先复用现有脚本,不重复造轮子
|
||||
## Standard Quantization Skill
|
||||
|
||||
## 执行习惯
|
||||
### 0) Prerequisites
|
||||
|
||||
- 对长耗时任务(转换/量化/上传)先给出检查命令和预估耗时
|
||||
- 上传命令默认提供两版:直连(无代理)与代理版
|
||||
- 任何会移动大文件的操作先检查磁盘空间
|
||||
- Python venv: `./.venv`
|
||||
- Docker + GPU(推荐)
|
||||
- 可用的 HF 模型目录(safetensors)
|
||||
- `hotwa/ik:latest` 可拉取
|
||||
|
||||
### 1) HF -> BF16 GGUF
|
||||
|
||||
在 `ik_llama.cpp` 中执行:
|
||||
|
||||
```bash
|
||||
python convert_hf_to_gguf.py \
|
||||
<hf_model_dir> \
|
||||
--outtype bf16 \
|
||||
--outfile <output_bf16_gguf>
|
||||
```
|
||||
|
||||
将 BF16 GGUF 放入 `artifacts/<model_name>/base_gguf/`。
|
||||
|
||||
### 2) Build Calibration Dataset
|
||||
|
||||
执行:
|
||||
|
||||
```bash
|
||||
./.venv/bin/python scripts/prepare_calib_data.py --force-refresh
|
||||
```
|
||||
|
||||
目标输出:`calibration/calibration_data_v5_rc_code.txt`,严格组成:
|
||||
|
||||
- 1152 blocks: `calibration_data_v5_rc.txt`
|
||||
- 2000 blocks: `QuixiAI/Code-74k-ShareGPT-Vicuna`
|
||||
- 1000 blocks: `alvarobartt/openhermes-preferences-coding`
|
||||
|
||||
### 3) Generate imatrix
|
||||
|
||||
```bash
|
||||
docker run --gpus all --rm \
|
||||
--entrypoint sh \
|
||||
-v <repo_root>:/workspace/models \
|
||||
-v <repo_root>/calibration/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \
|
||||
hotwa/ik:latest \
|
||||
-c "/llama-imatrix -m <bf16_gguf> -f /workspace/calib_data.txt -o <imatrix_out> --ctx-size 512 -ngl 99 --threads 16"
|
||||
```
|
||||
|
||||
### 4) Quantize
|
||||
|
||||
分别导出 `IQ4_KS`, `IQ5_K`, `IQ6_K`:
|
||||
|
||||
```bash
|
||||
docker run --gpus all --rm \
|
||||
--entrypoint sh \
|
||||
-v <repo_root>:/workspace/models \
|
||||
hotwa/ik:latest \
|
||||
-c "/llama-quantize --imatrix <imatrix_out> <bf16_gguf> <out_gguf> IQ4_KS"
|
||||
```
|
||||
|
||||
### 5) Prepare ModelScope Folder
|
||||
|
||||
- 复制 `templates/modelscope/*` 到 `modelscope_upload/`
|
||||
- 填写 `README.md` 与 `configuration.json`
|
||||
- 放入量化产物(GGUF + imatrix)
|
||||
|
||||
### 6) Upload
|
||||
|
||||
使用脚本:
|
||||
|
||||
```bash
|
||||
./scripts/upload_to_modelscope.sh <repo_id> <token> modelscope_upload direct "Upload quantized GGUF"
|
||||
```
|
||||
|
||||
`direct` 会自动关闭代理;`proxy` 保留代理。
|
||||
|
||||
## Definition Of Done
|
||||
|
||||
- BF16 GGUF 存在
|
||||
- `imatrix.dat` 存在
|
||||
- IQ4/IQ5/IQ6 均存在
|
||||
- `modelscope_upload/README.md` > 200 字且含 `tasks`、`license`
|
||||
- `modelscope_upload/configuration.json` 字段完整
|
||||
- `examples/<model_name>/` 已补充记录
|
||||
|
||||
66
README.md
66
README.md
@@ -1,37 +1,47 @@
|
||||
# Qwen3.5-27B Quantization Workspace
|
||||
# LLM GGUF Quantization Template
|
||||
|
||||
这个仓库用于沉淀 Qwen3.5-27B 系列模型的可复用量化流程与发布脚本,重点保存:
|
||||
本仓库是一个可复用模板,用于完成以下全流程:
|
||||
|
||||
- 量化流程文档
|
||||
- 校准数据与数据构建脚本
|
||||
- ModelScope 发布模板文件与上传脚本
|
||||
|
||||
不在仓库中托管权重文件(`.gguf` 等大文件已在 `.gitignore` 中忽略)。
|
||||
1. HuggingFace safetensors -> BF16 GGUF
|
||||
2. 构建混合校准数据(通用 + 代码)
|
||||
3. 基于 `ik_llama.cpp` 生成 imatrix
|
||||
4. 导出 IQ4_KS / IQ5_K / IQ6_K
|
||||
5. 组织 ModelScope 上传目录
|
||||
|
||||
## 目录结构
|
||||
|
||||
- `docs/`
|
||||
- `QWEN35_QUANTIZATION_MANUAL.md`
|
||||
- `MODELSCOPE_UPLOAD_SOP.md`
|
||||
- `scripts/`
|
||||
- `prepare_calib_data.py`
|
||||
- `upload_to_modelscope.sh`
|
||||
- `calibration/`
|
||||
- `calibration_data_v5_rc.txt`
|
||||
- `calibration_data_v5_rc_code.txt`
|
||||
- `sources/`
|
||||
- `modelscope_upload/`
|
||||
- 面向 ModelScope 的发布目录(README/configuration/.gitattributes 与产物)
|
||||
- `docs/`:模板级流程文档与检查清单
|
||||
- `scripts/`:可复用脚本
|
||||
- `templates/`:ModelScope 元数据模板
|
||||
- `examples/`:已跑通案例(参数与记录参考)
|
||||
- `calibration/`:校准数据与数据源缓存
|
||||
- `modelscope_upload/`:当前待上传工作目录(仅元数据入库)
|
||||
- `artifacts/`:本地大产物目录(忽略)
|
||||
|
||||
## 典型工作流
|
||||
详细结构见 `docs/REPO_STRUCTURE.md`。
|
||||
|
||||
1. 准备/更新校准数据(`scripts/prepare_calib_data.py`)
|
||||
2. 使用 Docker 进行 imatrix 与量化(见 `docs/QWEN35_QUANTIZATION_MANUAL.md`)
|
||||
3. 组织发布目录(`modelscope_upload/`)
|
||||
4. 手动执行上传(见 `docs/MODELSCOPE_UPLOAD_SOP.md` 或 `scripts/upload_to_modelscope.sh`)
|
||||
## 快速开始
|
||||
|
||||
## Git 建议
|
||||
1. 阅读 `docs/WORKFLOW_TEMPLATE.md`
|
||||
2. 按 `docs/NEW_MODEL_CHECKLIST.md` 执行与验收
|
||||
3. 参考 `examples/qwen35_27b/` 对照参数和发布文案
|
||||
|
||||
- 只提交脚本、文档、配置和小体积数据
|
||||
- 不提交 token、权重、环境目录
|
||||
- 每次流程调整同步更新 `docs/` 与 `AGENTS.md`
|
||||
## 校准数据标准组成
|
||||
|
||||
目标输出文件:`calibration/calibration_data_v5_rc_code.txt`
|
||||
|
||||
- 基础数据:1152 blocks(`calibration_data_v5_rc.txt`)
|
||||
- 代码对话:2000 blocks(`QuixiAI/Code-74k-ShareGPT-Vicuna`)
|
||||
- 代码偏好:1000 blocks(`alvarobartt/openhermes-preferences-coding`)
|
||||
|
||||
执行脚本:
|
||||
|
||||
```bash
|
||||
./.venv/bin/python scripts/prepare_calib_data.py --force-refresh
|
||||
```
|
||||
|
||||
## Git 约束
|
||||
|
||||
- 禁止提交:`*.gguf`, `*.safetensors`, `*.bin`, `*.pt` 等大权重
|
||||
- 禁止提交:token、密钥、账号凭据
|
||||
- 流程或脚本有变更时,必须同步更新 `docs/` 与案例文档
|
||||
|
||||
@@ -1,94 +0,0 @@
|
||||
# ModelScope 上传 SOP(当前项目)
|
||||
|
||||
## 1. 目录与文件
|
||||
|
||||
工作目录:
|
||||
|
||||
`/home/zly/project/modelscope_qwen35_27b_quantized`
|
||||
|
||||
上传目录:
|
||||
|
||||
`/home/zly/project/modelscope_qwen35_27b_quantized/modelscope_upload`
|
||||
|
||||
上传目录应包含:
|
||||
|
||||
- `README.md`(超过 200 字,含 `tasks` 和 `license`)
|
||||
- `configuration.json`
|
||||
- `.gitattributes`
|
||||
- `Qwen3.5-27B-IQ4_KS.gguf`
|
||||
- `Qwen3.5-27B-IQ5_K.gguf`
|
||||
- `Qwen3.5-27B-IQ6_K.gguf`
|
||||
- `Qwen3.5-27B.imatrix.dat`
|
||||
|
||||
快速检查:
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
ls -lah modelscope_upload
|
||||
wc -m modelscope_upload/README.md
|
||||
```
|
||||
|
||||
## 2. 环境准备
|
||||
|
||||
使用本地虚拟环境:
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
./.venv/bin/python -V
|
||||
./.venv/bin/modelscope --version
|
||||
```
|
||||
|
||||
如果未安装:
|
||||
|
||||
```bash
|
||||
./.venv/bin/pip install -U modelscope "setuptools<81"
|
||||
```
|
||||
|
||||
## 3. 登录 ModelScope
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
./.venv/bin/modelscope login --token "<YOUR_MODELSCOPE_TOKEN>"
|
||||
```
|
||||
|
||||
## 4. 上传(推荐:直连无代理)
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
env -u HTTP_PROXY -u HTTPS_PROXY -u ALL_PROXY -u NO_PROXY \
|
||||
./.venv/bin/modelscope upload \
|
||||
"jiaoyuan/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF" \
|
||||
"./modelscope_upload" \
|
||||
. \
|
||||
--repo-type model \
|
||||
--commit-message "Upload Qwen3.5-27B quantized GGUF weights"
|
||||
```
|
||||
|
||||
## 5. 上传(如需走代理)
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
./.venv/bin/modelscope upload \
|
||||
"jiaoyuan/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF" \
|
||||
"./modelscope_upload" \
|
||||
. \
|
||||
--repo-type model \
|
||||
--commit-message "Upload Qwen3.5-27B quantized GGUF weights"
|
||||
```
|
||||
|
||||
## 6. 断点/重传说明
|
||||
|
||||
- 上传中断后可直接重复执行第 4 步或第 5 步命令。
|
||||
- CLI 会先做 hash 校验并复用已上传分片,不需要手工删除本地文件。
|
||||
|
||||
## 7. 发布后检查
|
||||
|
||||
仓库地址:
|
||||
|
||||
`https://www.modelscope.cn/models/jiaoyuan/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF`
|
||||
|
||||
检查点:
|
||||
|
||||
- 文件是否完整显示(3 个 GGUF + 1 个 imatrix + 元数据)
|
||||
- README 是否正确展示任务和许可
|
||||
- 页面是否脱离预发布状态(若仍预发布,可补充说明后再申诉)
|
||||
35
docs/NEW_MODEL_CHECKLIST.md
Normal file
35
docs/NEW_MODEL_CHECKLIST.md
Normal file
@@ -0,0 +1,35 @@
|
||||
# New Model Checklist
|
||||
|
||||
## 基础信息
|
||||
|
||||
- [ ] 模型英文名(统一用于目录命名)
|
||||
- [ ] HuggingFace 源仓库
|
||||
- [ ] ModelScope 目标仓库(repo_id)
|
||||
|
||||
## 数据与转换
|
||||
|
||||
- [ ] safetensors 下载完整
|
||||
- [ ] BF16 GGUF 转换成功
|
||||
- [ ] `scripts/prepare_calib_data.py` 生成混合校准数据
|
||||
- [ ] 校准数据 block 数为 4152
|
||||
|
||||
## 量化
|
||||
|
||||
- [ ] imatrix 生成成功
|
||||
- [ ] IQ4_KS 生成成功
|
||||
- [ ] IQ5_K 生成成功
|
||||
- [ ] IQ6_K 生成成功
|
||||
|
||||
## 发布目录
|
||||
|
||||
- [ ] `modelscope_upload/README.md` 超过 200 字
|
||||
- [ ] README 包含 `tasks` 与 `license`
|
||||
- [ ] `modelscope_upload/configuration.json` 字段完整
|
||||
- [ ] `.gitattributes` 已配置 LFS
|
||||
- [ ] 上传目录包含实际模型文件(不止元数据)
|
||||
|
||||
## 文档归档
|
||||
|
||||
- [ ] `examples/<model_name>/README.md` 已更新
|
||||
- [ ] 关键命令与参数已写入 `examples/<model_name>/docs/`
|
||||
- [ ] 模板文档同步更新(如有流程改动)
|
||||
@@ -1,231 +0,0 @@
|
||||
# Qwen3.5-27B 量化操作手册(ik_llama.cpp Docker 版)
|
||||
|
||||
## 1. 目标与范围
|
||||
|
||||
本手册用于在目录 `/home/zly/project/modelscope_qwen35_27b_quantized` 中,使用 `ik_llama.cpp` 对 Qwen3.5-27B BF16 GGUF 进行 imatrix 计算与量化,产出:
|
||||
|
||||
- `Qwen3.5-27B.imatrix.dat`
|
||||
- `Qwen3.5-27B-IQ4_KS.gguf`
|
||||
- `Qwen3.5-27B-IQ5_K.gguf`
|
||||
- `Qwen3.5-27B-IQ6_K.gguf`
|
||||
|
||||
镜像:`hotwa/ik:latest`
|
||||
核心工具:`/llama-imatrix`、`/llama-quantize`
|
||||
|
||||
---
|
||||
|
||||
## 2. 前置条件
|
||||
|
||||
- Docker 可用并有权限访问 daemon
|
||||
- NVIDIA GPU 可用(推荐)
|
||||
- 当前目录存在 BF16 输入文件:
|
||||
- `Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf`
|
||||
- 当前目录存在 Python 环境和脚本:
|
||||
- `./.venv/bin/python`
|
||||
- `prepare_calib_data.py`
|
||||
|
||||
检查命令:
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
|
||||
docker run --rm --gpus all --entrypoint sh hotwa/ik:latest -c "ls -la /llama-imatrix /llama-quantize"
|
||||
ls -lh Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. 准备校准数据
|
||||
|
||||
### 3.1 下载基础校准文件(1152 blocks 来源)
|
||||
|
||||
推荐(社区常用版本):
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
wget -O calibration_data_v5_rc.txt \
|
||||
"https://gist.githubusercontent.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c/raw/571fda718462de863e5a0171078c175420c7649a/calibration_data_v5_rc.txt"
|
||||
```
|
||||
|
||||
官方备用源(网络可达时):
|
||||
|
||||
```bash
|
||||
wget -O calibration_data_v5_rc.txt \
|
||||
"https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/calibration/calibration_data.txt"
|
||||
```
|
||||
|
||||
### 3.2 生成混合校准集
|
||||
|
||||
脚本目标组成(严格):
|
||||
|
||||
- 基础数据:1152 blocks(`calibration_data_v5_rc.txt`)
|
||||
- 代码对话:2000 blocks(`QuixiAI/Code-74k-ShareGPT-Vicuna`)
|
||||
- 代码偏好:1000 blocks(`alvarobartt/openhermes-preferences-coding`)
|
||||
|
||||
执行:
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
./.venv/bin/python prepare_calib_data.py --force-refresh
|
||||
```
|
||||
|
||||
### 3.3 校验 block 数
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
|
||||
./.venv/bin/python - <<'PY'
|
||||
import re
|
||||
from pathlib import Path
|
||||
|
||||
def count_blocks(path):
|
||||
txt = Path(path).read_text(encoding="utf-8", errors="ignore")
|
||||
return len([b for b in re.split(r"\n\s*\n", txt) if b.strip()])
|
||||
|
||||
print("base =", count_blocks("calibration_data_v5_rc.txt"))
|
||||
print("mix =", count_blocks("calibration_data_v5_rc_code.txt"))
|
||||
PY
|
||||
```
|
||||
|
||||
期望:
|
||||
|
||||
- `base = 1152`
|
||||
- `mix = 4152`(1152 + 2000 + 1000)
|
||||
|
||||
---
|
||||
|
||||
## 4. 生成 imatrix
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
|
||||
docker run --gpus all --rm \
|
||||
--entrypoint sh \
|
||||
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
|
||||
-v /home/zly/project/modelscope_qwen35_27b_quantized/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \
|
||||
hotwa/ik:latest \
|
||||
-c "/llama-imatrix \
|
||||
-m /workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
|
||||
-f /workspace/calib_data.txt \
|
||||
-o /workspace/models/Qwen3.5-27B.imatrix.dat \
|
||||
--ctx-size 512 \
|
||||
-ngl 99 \
|
||||
--threads 16"
|
||||
```
|
||||
|
||||
完成校验:
|
||||
|
||||
```bash
|
||||
ls -lh Qwen3.5-27B.imatrix.dat
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. 量化三种格式
|
||||
|
||||
### 5.1 IQ4_KS
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
|
||||
docker run --gpus all --rm \
|
||||
--entrypoint sh \
|
||||
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
|
||||
hotwa/ik:latest \
|
||||
-c "/llama-quantize \
|
||||
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
|
||||
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
|
||||
/workspace/models/Qwen3.5-27B-IQ4_KS.gguf \
|
||||
IQ4_KS"
|
||||
```
|
||||
|
||||
### 5.2 IQ5_K
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
|
||||
docker run --gpus all --rm \
|
||||
--entrypoint sh \
|
||||
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
|
||||
hotwa/ik:latest \
|
||||
-c "/llama-quantize \
|
||||
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
|
||||
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
|
||||
/workspace/models/Qwen3.5-27B-IQ5_K.gguf \
|
||||
IQ5_K"
|
||||
```
|
||||
|
||||
### 5.3 IQ6_K
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
|
||||
docker run --gpus all --rm \
|
||||
--entrypoint sh \
|
||||
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
|
||||
hotwa/ik:latest \
|
||||
-c "/llama-quantize \
|
||||
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
|
||||
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
|
||||
/workspace/models/Qwen3.5-27B-IQ6_K.gguf \
|
||||
IQ6_K"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. 一次性校验结果
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
ls -lh Qwen3.5-27B.imatrix.dat Qwen3.5-27B-IQ4_KS.gguf Qwen3.5-27B-IQ5_K.gguf Qwen3.5-27B-IQ6_K.gguf
|
||||
```
|
||||
|
||||
本次实测(2026-03-02):
|
||||
|
||||
- `Qwen3.5-27B.imatrix.dat` = `13,582,647` bytes(约 `12.95 MB`)
|
||||
- `Qwen3.5-27B-IQ4_KS.gguf` = `14,705,833,248` bytes(约 `13.70 GB`)
|
||||
- `Qwen3.5-27B-IQ5_K.gguf` = `18,679,612,704` bytes(约 `17.40 GB`)
|
||||
- `Qwen3.5-27B-IQ6_K.gguf` = `22,292,632,864` bytes(约 `20.76 GB`)
|
||||
|
||||
---
|
||||
|
||||
## 7. 常见问题
|
||||
|
||||
### 7.1 `docker.sock` 权限错误
|
||||
|
||||
现象:`permission denied while trying to connect to the Docker daemon socket`
|
||||
|
||||
处理:
|
||||
|
||||
- 使用具备 Docker 权限的用户执行
|
||||
- 或检查 `docker` 用户组配置
|
||||
|
||||
### 7.2 下载源 DNS 失败
|
||||
|
||||
现象:`unable to resolve host address`
|
||||
|
||||
处理:
|
||||
|
||||
- 优先使用 gist 源(见 3.1)
|
||||
- 或配置可用代理后重试
|
||||
|
||||
### 7.3 输出文件属主为 `root`
|
||||
|
||||
容器写文件可能生成 root 属主。按需修正:
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
sudo chown -R $(id -u):$(id -g) Qwen3.5-27B*.gguf Qwen3.5-27B.imatrix.dat
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. ModelScope 发布最小清单(可选)
|
||||
|
||||
- `README.md`(>= 200 字,包含任务和许可信息)
|
||||
- `configuration.json`(包含 `framework`、`task`、`model.type`)
|
||||
- `.gitattributes`(`*.gguf` 走 LFS)
|
||||
- 量化文件:
|
||||
- `Qwen3.5-27B-IQ4_KS.gguf`
|
||||
- `Qwen3.5-27B-IQ5_K.gguf`
|
||||
- `Qwen3.5-27B-IQ6_K.gguf`
|
||||
31
docs/REPO_STRUCTURE.md
Normal file
31
docs/REPO_STRUCTURE.md
Normal file
@@ -0,0 +1,31 @@
|
||||
# Repository Structure
|
||||
|
||||
## Top-level
|
||||
|
||||
- `docs/`:通用流程与检查清单
|
||||
- `scripts/`:准备数据、上传等脚本
|
||||
- `templates/`:ModelScope 元数据模板
|
||||
- `examples/`:历史模型案例
|
||||
- `calibration/`:校准数据与缓存
|
||||
- `modelscope_upload/`:当前上传工作区
|
||||
- `artifacts/`:本地大产物(默认忽略)
|
||||
|
||||
## Recommended Layout
|
||||
|
||||
```text
|
||||
artifacts/
|
||||
<model_name>/
|
||||
base_gguf/
|
||||
quantized_gguf/
|
||||
examples/
|
||||
<model_name>/
|
||||
README.md
|
||||
docs/
|
||||
modelscope_upload/
|
||||
artifacts/
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- `modelscope_upload/` 只保留当前发布批次所需文件。
|
||||
- 案例目录用于记录“参数与过程”,避免知识丢失。
|
||||
71
docs/WORKFLOW_TEMPLATE.md
Normal file
71
docs/WORKFLOW_TEMPLATE.md
Normal file
@@ -0,0 +1,71 @@
|
||||
# Workflow Template
|
||||
|
||||
本流程用于新模型接入,默认在仓库根目录执行。
|
||||
|
||||
## Step 1: HF -> BF16 GGUF
|
||||
|
||||
使用 `ik_llama.cpp` 的转换脚本:
|
||||
|
||||
```bash
|
||||
python convert_hf_to_gguf.py \
|
||||
<hf_model_dir> \
|
||||
--outtype bf16 \
|
||||
--outfile artifacts/<model_name>/base_gguf/<model_name>-bf16.gguf
|
||||
```
|
||||
|
||||
## Step 2: 准备校准数据
|
||||
|
||||
```bash
|
||||
./.venv/bin/python scripts/prepare_calib_data.py --force-refresh
|
||||
```
|
||||
|
||||
输出:
|
||||
|
||||
- `calibration/calibration_data_v5_rc.txt`
|
||||
- `calibration/calibration_data_v5_rc_code.txt`
|
||||
|
||||
固定组成:1152 + 2000 + 1000 = 4152 blocks。
|
||||
|
||||
## Step 3: 生成 imatrix
|
||||
|
||||
```bash
|
||||
docker run --gpus all --rm \
|
||||
--entrypoint sh \
|
||||
-v <repo_root>:/workspace/models \
|
||||
-v <repo_root>/calibration/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \
|
||||
hotwa/ik:latest \
|
||||
-c "/llama-imatrix -m <bf16_gguf> -f /workspace/calib_data.txt -o <imatrix_out> --ctx-size 512 -ngl 99 --threads 16"
|
||||
```
|
||||
|
||||
## Step 4: 量化导出
|
||||
|
||||
分别执行:
|
||||
|
||||
```bash
|
||||
docker run --gpus all --rm \
|
||||
--entrypoint sh \
|
||||
-v <repo_root>:/workspace/models \
|
||||
hotwa/ik:latest \
|
||||
-c "/llama-quantize --imatrix <imatrix_out> <bf16_gguf> <out_gguf> IQ4_KS"
|
||||
```
|
||||
|
||||
将量化结果放入:`artifacts/<model_name>/quantized_gguf/`。
|
||||
|
||||
## Step 5: 组织上传目录
|
||||
|
||||
```bash
|
||||
cp templates/modelscope/README.template.md modelscope_upload/README.md
|
||||
cp templates/modelscope/configuration.template.json modelscope_upload/configuration.json
|
||||
cp templates/modelscope/.gitattributes modelscope_upload/.gitattributes
|
||||
```
|
||||
|
||||
然后把目标发布文件复制到 `modelscope_upload/`。
|
||||
|
||||
## Step 6: 上传
|
||||
|
||||
```bash
|
||||
./scripts/upload_to_modelscope.sh <repo_id> <token> modelscope_upload direct "Upload quantized GGUF"
|
||||
```
|
||||
|
||||
- `direct`:关闭代理上传
|
||||
- `proxy`:保留代理上传
|
||||
13
examples/qwen35_27b/README.md
Normal file
13
examples/qwen35_27b/README.md
Normal file
@@ -0,0 +1,13 @@
|
||||
# Qwen3.5-27B Example
|
||||
|
||||
本目录保存 Qwen3.5-27B Claude-Opus Distill 的一套已跑通案例,用于后续模型复用时参考。
|
||||
|
||||
## 内容
|
||||
|
||||
- `docs/`:当时执行的模型专用文档
|
||||
- `modelscope_upload/`:当时发布用元数据快照
|
||||
- `artifacts/`:参考产物(如 imatrix)
|
||||
|
||||
## 说明
|
||||
|
||||
该示例主要用于“参数与流程参考”,新模型接入请优先使用根目录 `docs/` 的模板文档。
|
||||
0
examples/qwen35_27b/artifacts/.gitkeep
Normal file
0
examples/qwen35_27b/artifacts/.gitkeep
Normal file
36
examples/qwen35_27b/docs/MODELSCOPE_UPLOAD_SOP.md
Normal file
36
examples/qwen35_27b/docs/MODELSCOPE_UPLOAD_SOP.md
Normal file
@@ -0,0 +1,36 @@
|
||||
# ModelScope 上传 SOP(Qwen3.5-27B 示例)
|
||||
|
||||
## 1. 组装上传目录
|
||||
|
||||
示例上传目录:`examples/qwen35_27b/modelscope_upload/`
|
||||
|
||||
应包含:
|
||||
|
||||
- `README.md`
|
||||
- `configuration.json`
|
||||
- `.gitattributes`
|
||||
- `Qwen3.5-27B-IQ4_KS.gguf`
|
||||
- `Qwen3.5-27B-IQ5_K.gguf`
|
||||
- `Qwen3.5-27B-IQ6_K.gguf`
|
||||
- `Qwen3.5-27B.imatrix.dat`
|
||||
|
||||
## 2. 登录与上传
|
||||
|
||||
推荐命令(关闭代理):
|
||||
|
||||
```bash
|
||||
./scripts/upload_to_modelscope.sh \
|
||||
"jiaoyuan/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF" \
|
||||
"<MODELSCOPE_TOKEN>" \
|
||||
"examples/qwen35_27b/modelscope_upload" \
|
||||
direct \
|
||||
"Upload Qwen3.5-27B quantized GGUF"
|
||||
```
|
||||
|
||||
如需代理,第四个参数改为 `proxy`。
|
||||
|
||||
## 3. 发布后检查
|
||||
|
||||
- 页面文件是否完整
|
||||
- README 是否展示 `tasks` 与 `license`
|
||||
- 是否脱离 preview 状态
|
||||
59
examples/qwen35_27b/docs/QWEN35_QUANTIZATION_MANUAL.md
Normal file
59
examples/qwen35_27b/docs/QWEN35_QUANTIZATION_MANUAL.md
Normal file
@@ -0,0 +1,59 @@
|
||||
# Qwen3.5-27B 量化手册(示例归档)
|
||||
|
||||
本文件是 `examples/qwen35_27b` 的历史实操记录,已按当前仓库结构整理。
|
||||
|
||||
## 1. 输入与输出
|
||||
|
||||
输入 BF16 GGUF:
|
||||
|
||||
- `artifacts/qwen35_27b/base_gguf/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf`
|
||||
|
||||
输出:
|
||||
|
||||
- `artifacts/qwen35_27b/quantized_gguf/Qwen3.5-27B-IQ4_KS.gguf`
|
||||
- `artifacts/qwen35_27b/quantized_gguf/Qwen3.5-27B-IQ5_K.gguf`
|
||||
- `artifacts/qwen35_27b/quantized_gguf/Qwen3.5-27B-IQ6_K.gguf`
|
||||
- `examples/qwen35_27b/artifacts/Qwen3.5-27B.imatrix.dat`
|
||||
|
||||
## 2. 校准数据
|
||||
|
||||
执行:
|
||||
|
||||
```bash
|
||||
./.venv/bin/python scripts/prepare_calib_data.py --force-refresh
|
||||
```
|
||||
|
||||
混合数据:`calibration/calibration_data_v5_rc_code.txt`(4152 blocks)。
|
||||
|
||||
## 3. imatrix
|
||||
|
||||
```bash
|
||||
docker run --gpus all --rm \
|
||||
--entrypoint sh \
|
||||
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
|
||||
-v /home/zly/project/modelscope_qwen35_27b_quantized/calibration/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \
|
||||
hotwa/ik:latest \
|
||||
-c "/llama-imatrix \
|
||||
-m /workspace/models/artifacts/qwen35_27b/base_gguf/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
|
||||
-f /workspace/calib_data.txt \
|
||||
-o /workspace/models/examples/qwen35_27b/artifacts/Qwen3.5-27B.imatrix.dat \
|
||||
--ctx-size 512 -ngl 99 --threads 16"
|
||||
```
|
||||
|
||||
## 4. 量化
|
||||
|
||||
示例(IQ4_KS):
|
||||
|
||||
```bash
|
||||
docker run --gpus all --rm \
|
||||
--entrypoint sh \
|
||||
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
|
||||
hotwa/ik:latest \
|
||||
-c "/llama-quantize \
|
||||
--imatrix /workspace/models/examples/qwen35_27b/artifacts/Qwen3.5-27B.imatrix.dat \
|
||||
/workspace/models/artifacts/qwen35_27b/base_gguf/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
|
||||
/workspace/models/artifacts/qwen35_27b/quantized_gguf/Qwen3.5-27B-IQ4_KS.gguf \
|
||||
IQ4_KS"
|
||||
```
|
||||
|
||||
`IQ5_K` 与 `IQ6_K` 仅替换输出名与量化类型。
|
||||
5
examples/qwen35_27b/modelscope_upload/.gitattributes
vendored
Normal file
5
examples/qwen35_27b/modelscope_upload/.gitattributes
vendored
Normal file
@@ -0,0 +1,5 @@
|
||||
*.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
*.dat filter=lfs diff=lfs merge=lfs -text
|
||||
*.md text eol=lf
|
||||
*.json text eol=lf
|
||||
.gitattributes text eol=lf
|
||||
76
examples/qwen35_27b/modelscope_upload/README.md
Normal file
76
examples/qwen35_27b/modelscope_upload/README.md
Normal file
@@ -0,0 +1,76 @@
|
||||
---
|
||||
tags:
|
||||
- text-generation
|
||||
- qwen
|
||||
- qwen35
|
||||
- gguf
|
||||
- quantization
|
||||
tasks:
|
||||
- text-generation
|
||||
license: Apache License 2.0
|
||||
---
|
||||
|
||||
# Qwen3.5-27B Quantized GGUF (IQ4_KS / IQ5_K / IQ6_K)
|
||||
|
||||
## 模型说明
|
||||
|
||||
该仓库提供 Qwen3.5-27B 的 GGUF 量化版本,适配 llama.cpp 生态,包含 IQ4_KS、IQ5_K、IQ6_K 三种规格。权重由 BF16 GGUF 输入文件通过 imatrix 方式量化,重点平衡了体积、推理速度与精度表现,适用于不同显存预算下的文本生成任务。
|
||||
|
||||
## 权重来源
|
||||
|
||||
- 原始 BF16 GGUF 来源:`TeichAI/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF`
|
||||
- 本仓库内容为基于该来源进行 imatrix + GGUF 量化后的发布版本(IQ4_KS / IQ5_K / IQ6_K)
|
||||
|
||||
## 量化方法
|
||||
|
||||
本仓库采用 `ik_llama.cpp` Docker 镜像(`hotwa/ik:latest`)进行两阶段量化:
|
||||
|
||||
1. 先用 `llama-imatrix` 基于校准语料计算 importance matrix(`Qwen3.5-27B.imatrix.dat`)
|
||||
2. 再用 `llama-quantize --imatrix ...` 分别导出 `IQ4_KS`、`IQ5_K`、`IQ6_K`
|
||||
|
||||
核心量化参数:
|
||||
|
||||
- imatrix 输入模型:`Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf`
|
||||
- `--ctx-size 512`
|
||||
- `-ngl 99`
|
||||
- `--threads 16`
|
||||
|
||||
该流程使用 imatrix 对不同权重的重要性进行建模,可在同等量化位宽下减少关键层信息损失,提升量化后推理稳定性。
|
||||
|
||||
## 校准数据来源与选择依据
|
||||
|
||||
量化校准文件为 `calibration_data_v5_rc_code.txt`,总计 `4152` blocks,构成如下:
|
||||
|
||||
- `1152` blocks:基础校准数据 `calibration_data_v5_rc.txt`
|
||||
- `2000` blocks:`QuixiAI/Code-74k-ShareGPT-Vicuna`
|
||||
- `1000` blocks:`alvarobartt/openhermes-preferences-coding`(`chosen` 分支)
|
||||
|
||||
基础校准数据下载源:
|
||||
|
||||
- 社区常用版本:`https://gist.githubusercontent.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c/raw/571fda718462de863e5a0171078c175420c7649a/calibration_data_v5_rc.txt`
|
||||
- 官方备用源:`https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/calibration/calibration_data.txt`
|
||||
|
||||
选择这三部分数据的目的:
|
||||
|
||||
- 基础数据用于覆盖通用语义与常见文本分布,避免模型只对代码域过拟合
|
||||
- Code-74k 对话样本提升代码生成、调试、解释等场景的量化保真度
|
||||
- OpenHermes coding preference 样本提供“更优回答偏好”信号,帮助保持代码输出的结构化与可读性
|
||||
|
||||
该组合在“通用文本 + 代码任务”之间做了平衡,适合 Qwen3.5-27B Distill 模型的实际使用场景。
|
||||
|
||||
## 文件内容
|
||||
|
||||
- `Qwen3.5-27B-IQ4_KS.gguf`:低显存优先
|
||||
- `Qwen3.5-27B-IQ5_K.gguf`:性能和质量平衡
|
||||
- `Qwen3.5-27B-IQ6_K.gguf`:更高保真优先
|
||||
- `Qwen3.5-27B.imatrix.dat`:量化使用的 importance matrix
|
||||
|
||||
## 使用建议
|
||||
|
||||
- 设备资源紧张时优先 IQ4_KS
|
||||
- 通用推理场景优先 IQ5_K
|
||||
- 对质量要求更高时使用 IQ6_K
|
||||
|
||||
## 备注
|
||||
|
||||
该仓库用于发布可直接推理的 GGUF 权重,不包含训练过程文件。推理时请使用支持 GGUF 的推理框架(如 llama.cpp 相关实现)。
|
||||
10
examples/qwen35_27b/modelscope_upload/configuration.json
Normal file
10
examples/qwen35_27b/modelscope_upload/configuration.json
Normal file
@@ -0,0 +1,10 @@
|
||||
{
|
||||
"framework": "ggml",
|
||||
"task": "text-generation",
|
||||
"model": {
|
||||
"type": "qwen35"
|
||||
},
|
||||
"pipeline": {
|
||||
"type": "text-generation"
|
||||
}
|
||||
}
|
||||
Binary file not shown.
@@ -1,76 +1,12 @@
|
||||
---
|
||||
tags:
|
||||
- text-generation
|
||||
- qwen
|
||||
- qwen35
|
||||
- gguf
|
||||
- quantization
|
||||
tasks:
|
||||
- text-generation
|
||||
license: Apache License 2.0
|
||||
---
|
||||
# ModelScope Upload Workspace
|
||||
|
||||
# Qwen3.5-27B Quantized GGUF (IQ4_KS / IQ5_K / IQ6_K)
|
||||
该目录用于当前模型发布时的临时工作区。
|
||||
|
||||
## 模型说明
|
||||
推荐做法:
|
||||
|
||||
该仓库提供 Qwen3.5-27B 的 GGUF 量化版本,适配 llama.cpp 生态,包含 IQ4_KS、IQ5_K、IQ6_K 三种规格。权重由 BF16 GGUF 输入文件通过 imatrix 方式量化,重点平衡了体积、推理速度与精度表现,适用于不同显存预算下的文本生成任务。
|
||||
1. 从 `templates/modelscope/` 复制模板文件
|
||||
2. 按当前模型填写 README 和 configuration
|
||||
3. 放入量化产物(GGUF、imatrix)
|
||||
4. 执行上传命令
|
||||
|
||||
## 权重来源
|
||||
|
||||
- 原始 BF16 GGUF 来源:`TeichAI/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF`
|
||||
- 本仓库内容为基于该来源进行 imatrix + GGUF 量化后的发布版本(IQ4_KS / IQ5_K / IQ6_K)
|
||||
|
||||
## 量化方法
|
||||
|
||||
本仓库采用 `ik_llama.cpp` Docker 镜像(`hotwa/ik:latest`)进行两阶段量化:
|
||||
|
||||
1. 先用 `llama-imatrix` 基于校准语料计算 importance matrix(`Qwen3.5-27B.imatrix.dat`)
|
||||
2. 再用 `llama-quantize --imatrix ...` 分别导出 `IQ4_KS`、`IQ5_K`、`IQ6_K`
|
||||
|
||||
核心量化参数:
|
||||
|
||||
- imatrix 输入模型:`Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf`
|
||||
- `--ctx-size 512`
|
||||
- `-ngl 99`
|
||||
- `--threads 16`
|
||||
|
||||
该流程使用 imatrix 对不同权重的重要性进行建模,可在同等量化位宽下减少关键层信息损失,提升量化后推理稳定性。
|
||||
|
||||
## 校准数据来源与选择依据
|
||||
|
||||
量化校准文件为 `calibration_data_v5_rc_code.txt`,总计 `4152` blocks,构成如下:
|
||||
|
||||
- `1152` blocks:基础校准数据 `calibration_data_v5_rc.txt`
|
||||
- `2000` blocks:`QuixiAI/Code-74k-ShareGPT-Vicuna`
|
||||
- `1000` blocks:`alvarobartt/openhermes-preferences-coding`(`chosen` 分支)
|
||||
|
||||
基础校准数据下载源:
|
||||
|
||||
- 社区常用版本:`https://gist.githubusercontent.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c/raw/571fda718462de863e5a0171078c175420c7649a/calibration_data_v5_rc.txt`
|
||||
- 官方备用源:`https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/calibration/calibration_data.txt`
|
||||
|
||||
选择这三部分数据的目的:
|
||||
|
||||
- 基础数据用于覆盖通用语义与常见文本分布,避免模型只对代码域过拟合
|
||||
- Code-74k 对话样本提升代码生成、调试、解释等场景的量化保真度
|
||||
- OpenHermes coding preference 样本提供“更优回答偏好”信号,帮助保持代码输出的结构化与可读性
|
||||
|
||||
该组合在“通用文本 + 代码任务”之间做了平衡,适合 Qwen3.5-27B Distill 模型的实际使用场景。
|
||||
|
||||
## 文件内容
|
||||
|
||||
- `Qwen3.5-27B-IQ4_KS.gguf`:低显存优先
|
||||
- `Qwen3.5-27B-IQ5_K.gguf`:性能和质量平衡
|
||||
- `Qwen3.5-27B-IQ6_K.gguf`:更高保真优先
|
||||
- `Qwen3.5-27B.imatrix.dat`:量化使用的 importance matrix
|
||||
|
||||
## 使用建议
|
||||
|
||||
- 设备资源紧张时优先 IQ4_KS
|
||||
- 通用推理场景优先 IQ5_K
|
||||
- 对质量要求更高时使用 IQ6_K
|
||||
|
||||
## 备注
|
||||
|
||||
该仓库用于发布可直接推理的 GGUF 权重,不包含训练过程文件。推理时请使用支持 GGUF 的推理框架(如 llama.cpp 相关实现)。
|
||||
注意:本仓库默认忽略该目录内的大权重文件,仅跟踪小体积元数据。
|
||||
|
||||
@@ -22,6 +22,8 @@ BASE_URL = (
|
||||
"examples/calibration/calibration_data.txt"
|
||||
)
|
||||
BLOCK_SPLIT_RE = re.compile(r"\n\s*\n")
|
||||
SCRIPT_DIR = Path(__file__).resolve().parent
|
||||
ROOT_DIR = SCRIPT_DIR.parent
|
||||
|
||||
|
||||
def split_blocks(text: str) -> list[str]:
|
||||
@@ -130,15 +132,21 @@ def ensure_cached_blocks(
|
||||
def main() -> int:
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("--seed", type=int, default=42)
|
||||
parser.add_argument("--base-file", default="calibration_data_v5_rc.txt")
|
||||
parser.add_argument("--output", default="calibration_data_v5_rc_code.txt")
|
||||
parser.add_argument("--data-dir", default="data")
|
||||
parser.add_argument("--base-file", default="calibration/calibration_data_v5_rc.txt")
|
||||
parser.add_argument("--output", default="calibration/calibration_data_v5_rc_code.txt")
|
||||
parser.add_argument("--data-dir", default="calibration/sources")
|
||||
parser.add_argument("--force-refresh", action="store_true")
|
||||
args = parser.parse_args()
|
||||
|
||||
base_file = Path(args.base_file)
|
||||
output_file = Path(args.output)
|
||||
data_dir = Path(args.data_dir)
|
||||
def resolve_path(path_text: str) -> Path:
|
||||
p = Path(path_text)
|
||||
if p.is_absolute():
|
||||
return p
|
||||
return ROOT_DIR / p
|
||||
|
||||
base_file = resolve_path(args.base_file)
|
||||
output_file = resolve_path(args.output)
|
||||
data_dir = resolve_path(args.data_dir)
|
||||
code_cache = data_dir / "code74k_2000.txt"
|
||||
openhermes_cache = data_dir / "openhermes_coding_chosen_1000.txt"
|
||||
|
||||
|
||||
49
scripts/upload_to_modelscope.sh
Executable file → Normal file
49
scripts/upload_to_modelscope.sh
Executable file → Normal file
@@ -1,25 +1,58 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
# 用法:
|
||||
# ./upload_to_modelscope.sh <repo_id> <token>
|
||||
# 示例:
|
||||
# ./upload_to_modelscope.sh your_username/your_repo_name ms-xxxxxxxx
|
||||
# Usage:
|
||||
# ./scripts/upload_to_modelscope.sh <repo_id> <token> [upload_dir] [mode] [commit_message]
|
||||
#
|
||||
# Examples:
|
||||
# ./scripts/upload_to_modelscope.sh your_user/your_repo ms-xxxx
|
||||
# ./scripts/upload_to_modelscope.sh your_user/your_repo ms-xxxx modelscope_upload proxy
|
||||
#
|
||||
# mode:
|
||||
# direct (default): unset proxy vars for direct connection
|
||||
# proxy: keep current proxy environment
|
||||
|
||||
REPO_ID="${1:-}"
|
||||
TOKEN="${2:-}"
|
||||
UPLOAD_DIR_ARG="${3:-modelscope_upload}"
|
||||
MODE="${4:-direct}"
|
||||
COMMIT_MESSAGE="${5:-Upload model artifacts}"
|
||||
|
||||
if [[ -z "${REPO_ID}" || -z "${TOKEN}" ]]; then
|
||||
echo "Usage: $0 <repo_id> <token>"
|
||||
echo "Usage: $0 <repo_id> <token> [upload_dir] [mode] [commit_message]"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
ROOT_DIR="$(cd "${SCRIPT_DIR}/.." && pwd)"
|
||||
UPLOAD_DIR="${UPLOAD_DIR_ARG}"
|
||||
if [[ "${UPLOAD_DIR}" != /* ]]; then
|
||||
UPLOAD_DIR="${ROOT_DIR}/${UPLOAD_DIR}"
|
||||
fi
|
||||
|
||||
"${ROOT_DIR}/.venv/bin/modelscope" login --token "${TOKEN}"
|
||||
"${ROOT_DIR}/.venv/bin/modelscope" upload "${REPO_ID}" "${SCRIPT_DIR}" . \
|
||||
if [[ ! -d "${UPLOAD_DIR}" ]]; then
|
||||
echo "Upload directory does not exist: ${UPLOAD_DIR}"
|
||||
exit 2
|
||||
fi
|
||||
|
||||
MODELSCOPE_BIN="${ROOT_DIR}/.venv/bin/modelscope"
|
||||
if [[ ! -x "${MODELSCOPE_BIN}" ]]; then
|
||||
echo "modelscope CLI not found at ${MODELSCOPE_BIN}"
|
||||
exit 3
|
||||
fi
|
||||
|
||||
if [[ "${MODE}" == "direct" ]]; then
|
||||
RUN_CMD=(env -u HTTP_PROXY -u HTTPS_PROXY -u ALL_PROXY -u NO_PROXY "${MODELSCOPE_BIN}")
|
||||
elif [[ "${MODE}" == "proxy" ]]; then
|
||||
RUN_CMD=("${MODELSCOPE_BIN}")
|
||||
else
|
||||
echo "Unsupported mode: ${MODE} (use direct|proxy)"
|
||||
exit 4
|
||||
fi
|
||||
|
||||
"${RUN_CMD[@]}" login --token "${TOKEN}"
|
||||
"${RUN_CMD[@]}" upload "${REPO_ID}" "${UPLOAD_DIR}" . \
|
||||
--repo-type model \
|
||||
--commit-message "Upload Qwen3.5-27B quantized GGUF weights"
|
||||
--commit-message "${COMMIT_MESSAGE}"
|
||||
|
||||
echo "Upload finished: ${REPO_ID}"
|
||||
|
||||
5
templates/modelscope/.gitattributes
vendored
Normal file
5
templates/modelscope/.gitattributes
vendored
Normal file
@@ -0,0 +1,5 @@
|
||||
*.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
*.dat filter=lfs diff=lfs merge=lfs -text
|
||||
*.md text eol=lf
|
||||
*.json text eol=lf
|
||||
.gitattributes text eol=lf
|
||||
38
templates/modelscope/README.template.md
Normal file
38
templates/modelscope/README.template.md
Normal file
@@ -0,0 +1,38 @@
|
||||
---
|
||||
tags:
|
||||
- text-generation
|
||||
- gguf
|
||||
tasks:
|
||||
- text-generation
|
||||
license: Apache License 2.0
|
||||
---
|
||||
|
||||
# <Model Name> Quantized GGUF
|
||||
|
||||
## 模型说明
|
||||
|
||||
简述模型用途、量化目标和适配场景。
|
||||
|
||||
## 权重来源
|
||||
|
||||
- 原始模型来源:`<HF repo>`
|
||||
- 本仓库产物:`<quant types>`
|
||||
|
||||
## 量化方法
|
||||
|
||||
- 转换:HF safetensors -> BF16 GGUF
|
||||
- 校准:imatrix
|
||||
- 导出:IQ4_KS / IQ5_K / IQ6_K
|
||||
|
||||
## 校准数据来源
|
||||
|
||||
- 基础校准数据
|
||||
- 代码对话数据
|
||||
- 代码偏好数据
|
||||
|
||||
## 文件内容
|
||||
|
||||
- `<model>-IQ4_KS.gguf`
|
||||
- `<model>-IQ5_K.gguf`
|
||||
- `<model>-IQ6_K.gguf`
|
||||
- `<model>.imatrix.dat`
|
||||
10
templates/modelscope/configuration.template.json
Normal file
10
templates/modelscope/configuration.template.json
Normal file
@@ -0,0 +1,10 @@
|
||||
{
|
||||
"framework": "ggml",
|
||||
"task": "text-generation",
|
||||
"model": {
|
||||
"type": "qwen35"
|
||||
},
|
||||
"pipeline": {
|
||||
"type": "text-generation"
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user