first commit
This commit is contained in:
@@ -1,94 +0,0 @@
|
||||
# ModelScope 上传 SOP(当前项目)
|
||||
|
||||
## 1. 目录与文件
|
||||
|
||||
工作目录:
|
||||
|
||||
`/home/zly/project/modelscope_qwen35_27b_quantized`
|
||||
|
||||
上传目录:
|
||||
|
||||
`/home/zly/project/modelscope_qwen35_27b_quantized/modelscope_upload`
|
||||
|
||||
上传目录应包含:
|
||||
|
||||
- `README.md`(超过 200 字,含 `tasks` 和 `license`)
|
||||
- `configuration.json`
|
||||
- `.gitattributes`
|
||||
- `Qwen3.5-27B-IQ4_KS.gguf`
|
||||
- `Qwen3.5-27B-IQ5_K.gguf`
|
||||
- `Qwen3.5-27B-IQ6_K.gguf`
|
||||
- `Qwen3.5-27B.imatrix.dat`
|
||||
|
||||
快速检查:
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
ls -lah modelscope_upload
|
||||
wc -m modelscope_upload/README.md
|
||||
```
|
||||
|
||||
## 2. 环境准备
|
||||
|
||||
使用本地虚拟环境:
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
./.venv/bin/python -V
|
||||
./.venv/bin/modelscope --version
|
||||
```
|
||||
|
||||
如果未安装:
|
||||
|
||||
```bash
|
||||
./.venv/bin/pip install -U modelscope "setuptools<81"
|
||||
```
|
||||
|
||||
## 3. 登录 ModelScope
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
./.venv/bin/modelscope login --token "<YOUR_MODELSCOPE_TOKEN>"
|
||||
```
|
||||
|
||||
## 4. 上传(推荐:直连无代理)
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
env -u HTTP_PROXY -u HTTPS_PROXY -u ALL_PROXY -u NO_PROXY \
|
||||
./.venv/bin/modelscope upload \
|
||||
"jiaoyuan/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF" \
|
||||
"./modelscope_upload" \
|
||||
. \
|
||||
--repo-type model \
|
||||
--commit-message "Upload Qwen3.5-27B quantized GGUF weights"
|
||||
```
|
||||
|
||||
## 5. 上传(如需走代理)
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
./.venv/bin/modelscope upload \
|
||||
"jiaoyuan/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF" \
|
||||
"./modelscope_upload" \
|
||||
. \
|
||||
--repo-type model \
|
||||
--commit-message "Upload Qwen3.5-27B quantized GGUF weights"
|
||||
```
|
||||
|
||||
## 6. 断点/重传说明
|
||||
|
||||
- 上传中断后可直接重复执行第 4 步或第 5 步命令。
|
||||
- CLI 会先做 hash 校验并复用已上传分片,不需要手工删除本地文件。
|
||||
|
||||
## 7. 发布后检查
|
||||
|
||||
仓库地址:
|
||||
|
||||
`https://www.modelscope.cn/models/jiaoyuan/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF`
|
||||
|
||||
检查点:
|
||||
|
||||
- 文件是否完整显示(3 个 GGUF + 1 个 imatrix + 元数据)
|
||||
- README 是否正确展示任务和许可
|
||||
- 页面是否脱离预发布状态(若仍预发布,可补充说明后再申诉)
|
||||
35
docs/NEW_MODEL_CHECKLIST.md
Normal file
35
docs/NEW_MODEL_CHECKLIST.md
Normal file
@@ -0,0 +1,35 @@
|
||||
# New Model Checklist
|
||||
|
||||
## 基础信息
|
||||
|
||||
- [ ] 模型英文名(统一用于目录命名)
|
||||
- [ ] HuggingFace 源仓库
|
||||
- [ ] ModelScope 目标仓库(repo_id)
|
||||
|
||||
## 数据与转换
|
||||
|
||||
- [ ] safetensors 下载完整
|
||||
- [ ] BF16 GGUF 转换成功
|
||||
- [ ] `scripts/prepare_calib_data.py` 生成混合校准数据
|
||||
- [ ] 校准数据 block 数为 4152
|
||||
|
||||
## 量化
|
||||
|
||||
- [ ] imatrix 生成成功
|
||||
- [ ] IQ4_KS 生成成功
|
||||
- [ ] IQ5_K 生成成功
|
||||
- [ ] IQ6_K 生成成功
|
||||
|
||||
## 发布目录
|
||||
|
||||
- [ ] `modelscope_upload/README.md` 超过 200 字
|
||||
- [ ] README 包含 `tasks` 与 `license`
|
||||
- [ ] `modelscope_upload/configuration.json` 字段完整
|
||||
- [ ] `.gitattributes` 已配置 LFS
|
||||
- [ ] 上传目录包含实际模型文件(不止元数据)
|
||||
|
||||
## 文档归档
|
||||
|
||||
- [ ] `examples/<model_name>/README.md` 已更新
|
||||
- [ ] 关键命令与参数已写入 `examples/<model_name>/docs/`
|
||||
- [ ] 模板文档同步更新(如有流程改动)
|
||||
@@ -1,231 +0,0 @@
|
||||
# Qwen3.5-27B 量化操作手册(ik_llama.cpp Docker 版)
|
||||
|
||||
## 1. 目标与范围
|
||||
|
||||
本手册用于在目录 `/home/zly/project/modelscope_qwen35_27b_quantized` 中,使用 `ik_llama.cpp` 对 Qwen3.5-27B BF16 GGUF 进行 imatrix 计算与量化,产出:
|
||||
|
||||
- `Qwen3.5-27B.imatrix.dat`
|
||||
- `Qwen3.5-27B-IQ4_KS.gguf`
|
||||
- `Qwen3.5-27B-IQ5_K.gguf`
|
||||
- `Qwen3.5-27B-IQ6_K.gguf`
|
||||
|
||||
镜像:`hotwa/ik:latest`
|
||||
核心工具:`/llama-imatrix`、`/llama-quantize`
|
||||
|
||||
---
|
||||
|
||||
## 2. 前置条件
|
||||
|
||||
- Docker 可用并有权限访问 daemon
|
||||
- NVIDIA GPU 可用(推荐)
|
||||
- 当前目录存在 BF16 输入文件:
|
||||
- `Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf`
|
||||
- 当前目录存在 Python 环境和脚本:
|
||||
- `./.venv/bin/python`
|
||||
- `prepare_calib_data.py`
|
||||
|
||||
检查命令:
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
|
||||
docker run --rm --gpus all --entrypoint sh hotwa/ik:latest -c "ls -la /llama-imatrix /llama-quantize"
|
||||
ls -lh Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. 准备校准数据
|
||||
|
||||
### 3.1 下载基础校准文件(1152 blocks 来源)
|
||||
|
||||
推荐(社区常用版本):
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
wget -O calibration_data_v5_rc.txt \
|
||||
"https://gist.githubusercontent.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c/raw/571fda718462de863e5a0171078c175420c7649a/calibration_data_v5_rc.txt"
|
||||
```
|
||||
|
||||
官方备用源(网络可达时):
|
||||
|
||||
```bash
|
||||
wget -O calibration_data_v5_rc.txt \
|
||||
"https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/calibration/calibration_data.txt"
|
||||
```
|
||||
|
||||
### 3.2 生成混合校准集
|
||||
|
||||
脚本目标组成(严格):
|
||||
|
||||
- 基础数据:1152 blocks(`calibration_data_v5_rc.txt`)
|
||||
- 代码对话:2000 blocks(`QuixiAI/Code-74k-ShareGPT-Vicuna`)
|
||||
- 代码偏好:1000 blocks(`alvarobartt/openhermes-preferences-coding`)
|
||||
|
||||
执行:
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
./.venv/bin/python prepare_calib_data.py --force-refresh
|
||||
```
|
||||
|
||||
### 3.3 校验 block 数
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
|
||||
./.venv/bin/python - <<'PY'
|
||||
import re
|
||||
from pathlib import Path
|
||||
|
||||
def count_blocks(path):
|
||||
txt = Path(path).read_text(encoding="utf-8", errors="ignore")
|
||||
return len([b for b in re.split(r"\n\s*\n", txt) if b.strip()])
|
||||
|
||||
print("base =", count_blocks("calibration_data_v5_rc.txt"))
|
||||
print("mix =", count_blocks("calibration_data_v5_rc_code.txt"))
|
||||
PY
|
||||
```
|
||||
|
||||
期望:
|
||||
|
||||
- `base = 1152`
|
||||
- `mix = 4152`(1152 + 2000 + 1000)
|
||||
|
||||
---
|
||||
|
||||
## 4. 生成 imatrix
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
|
||||
docker run --gpus all --rm \
|
||||
--entrypoint sh \
|
||||
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
|
||||
-v /home/zly/project/modelscope_qwen35_27b_quantized/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \
|
||||
hotwa/ik:latest \
|
||||
-c "/llama-imatrix \
|
||||
-m /workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
|
||||
-f /workspace/calib_data.txt \
|
||||
-o /workspace/models/Qwen3.5-27B.imatrix.dat \
|
||||
--ctx-size 512 \
|
||||
-ngl 99 \
|
||||
--threads 16"
|
||||
```
|
||||
|
||||
完成校验:
|
||||
|
||||
```bash
|
||||
ls -lh Qwen3.5-27B.imatrix.dat
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. 量化三种格式
|
||||
|
||||
### 5.1 IQ4_KS
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
|
||||
docker run --gpus all --rm \
|
||||
--entrypoint sh \
|
||||
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
|
||||
hotwa/ik:latest \
|
||||
-c "/llama-quantize \
|
||||
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
|
||||
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
|
||||
/workspace/models/Qwen3.5-27B-IQ4_KS.gguf \
|
||||
IQ4_KS"
|
||||
```
|
||||
|
||||
### 5.2 IQ5_K
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
|
||||
docker run --gpus all --rm \
|
||||
--entrypoint sh \
|
||||
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
|
||||
hotwa/ik:latest \
|
||||
-c "/llama-quantize \
|
||||
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
|
||||
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
|
||||
/workspace/models/Qwen3.5-27B-IQ5_K.gguf \
|
||||
IQ5_K"
|
||||
```
|
||||
|
||||
### 5.3 IQ6_K
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
|
||||
docker run --gpus all --rm \
|
||||
--entrypoint sh \
|
||||
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
|
||||
hotwa/ik:latest \
|
||||
-c "/llama-quantize \
|
||||
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
|
||||
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
|
||||
/workspace/models/Qwen3.5-27B-IQ6_K.gguf \
|
||||
IQ6_K"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. 一次性校验结果
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
ls -lh Qwen3.5-27B.imatrix.dat Qwen3.5-27B-IQ4_KS.gguf Qwen3.5-27B-IQ5_K.gguf Qwen3.5-27B-IQ6_K.gguf
|
||||
```
|
||||
|
||||
本次实测(2026-03-02):
|
||||
|
||||
- `Qwen3.5-27B.imatrix.dat` = `13,582,647` bytes(约 `12.95 MB`)
|
||||
- `Qwen3.5-27B-IQ4_KS.gguf` = `14,705,833,248` bytes(约 `13.70 GB`)
|
||||
- `Qwen3.5-27B-IQ5_K.gguf` = `18,679,612,704` bytes(约 `17.40 GB`)
|
||||
- `Qwen3.5-27B-IQ6_K.gguf` = `22,292,632,864` bytes(约 `20.76 GB`)
|
||||
|
||||
---
|
||||
|
||||
## 7. 常见问题
|
||||
|
||||
### 7.1 `docker.sock` 权限错误
|
||||
|
||||
现象:`permission denied while trying to connect to the Docker daemon socket`
|
||||
|
||||
处理:
|
||||
|
||||
- 使用具备 Docker 权限的用户执行
|
||||
- 或检查 `docker` 用户组配置
|
||||
|
||||
### 7.2 下载源 DNS 失败
|
||||
|
||||
现象:`unable to resolve host address`
|
||||
|
||||
处理:
|
||||
|
||||
- 优先使用 gist 源(见 3.1)
|
||||
- 或配置可用代理后重试
|
||||
|
||||
### 7.3 输出文件属主为 `root`
|
||||
|
||||
容器写文件可能生成 root 属主。按需修正:
|
||||
|
||||
```bash
|
||||
cd /home/zly/project/modelscope_qwen35_27b_quantized
|
||||
sudo chown -R $(id -u):$(id -g) Qwen3.5-27B*.gguf Qwen3.5-27B.imatrix.dat
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. ModelScope 发布最小清单(可选)
|
||||
|
||||
- `README.md`(>= 200 字,包含任务和许可信息)
|
||||
- `configuration.json`(包含 `framework`、`task`、`model.type`)
|
||||
- `.gitattributes`(`*.gguf` 走 LFS)
|
||||
- 量化文件:
|
||||
- `Qwen3.5-27B-IQ4_KS.gguf`
|
||||
- `Qwen3.5-27B-IQ5_K.gguf`
|
||||
- `Qwen3.5-27B-IQ6_K.gguf`
|
||||
31
docs/REPO_STRUCTURE.md
Normal file
31
docs/REPO_STRUCTURE.md
Normal file
@@ -0,0 +1,31 @@
|
||||
# Repository Structure
|
||||
|
||||
## Top-level
|
||||
|
||||
- `docs/`:通用流程与检查清单
|
||||
- `scripts/`:准备数据、上传等脚本
|
||||
- `templates/`:ModelScope 元数据模板
|
||||
- `examples/`:历史模型案例
|
||||
- `calibration/`:校准数据与缓存
|
||||
- `modelscope_upload/`:当前上传工作区
|
||||
- `artifacts/`:本地大产物(默认忽略)
|
||||
|
||||
## Recommended Layout
|
||||
|
||||
```text
|
||||
artifacts/
|
||||
<model_name>/
|
||||
base_gguf/
|
||||
quantized_gguf/
|
||||
examples/
|
||||
<model_name>/
|
||||
README.md
|
||||
docs/
|
||||
modelscope_upload/
|
||||
artifacts/
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- `modelscope_upload/` 只保留当前发布批次所需文件。
|
||||
- 案例目录用于记录“参数与过程”,避免知识丢失。
|
||||
71
docs/WORKFLOW_TEMPLATE.md
Normal file
71
docs/WORKFLOW_TEMPLATE.md
Normal file
@@ -0,0 +1,71 @@
|
||||
# Workflow Template
|
||||
|
||||
本流程用于新模型接入,默认在仓库根目录执行。
|
||||
|
||||
## Step 1: HF -> BF16 GGUF
|
||||
|
||||
使用 `ik_llama.cpp` 的转换脚本:
|
||||
|
||||
```bash
|
||||
python convert_hf_to_gguf.py \
|
||||
<hf_model_dir> \
|
||||
--outtype bf16 \
|
||||
--outfile artifacts/<model_name>/base_gguf/<model_name>-bf16.gguf
|
||||
```
|
||||
|
||||
## Step 2: 准备校准数据
|
||||
|
||||
```bash
|
||||
./.venv/bin/python scripts/prepare_calib_data.py --force-refresh
|
||||
```
|
||||
|
||||
输出:
|
||||
|
||||
- `calibration/calibration_data_v5_rc.txt`
|
||||
- `calibration/calibration_data_v5_rc_code.txt`
|
||||
|
||||
固定组成:1152 + 2000 + 1000 = 4152 blocks。
|
||||
|
||||
## Step 3: 生成 imatrix
|
||||
|
||||
```bash
|
||||
docker run --gpus all --rm \
|
||||
--entrypoint sh \
|
||||
-v <repo_root>:/workspace/models \
|
||||
-v <repo_root>/calibration/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \
|
||||
hotwa/ik:latest \
|
||||
-c "/llama-imatrix -m <bf16_gguf> -f /workspace/calib_data.txt -o <imatrix_out> --ctx-size 512 -ngl 99 --threads 16"
|
||||
```
|
||||
|
||||
## Step 4: 量化导出
|
||||
|
||||
分别执行:
|
||||
|
||||
```bash
|
||||
docker run --gpus all --rm \
|
||||
--entrypoint sh \
|
||||
-v <repo_root>:/workspace/models \
|
||||
hotwa/ik:latest \
|
||||
-c "/llama-quantize --imatrix <imatrix_out> <bf16_gguf> <out_gguf> IQ4_KS"
|
||||
```
|
||||
|
||||
将量化结果放入:`artifacts/<model_name>/quantized_gguf/`。
|
||||
|
||||
## Step 5: 组织上传目录
|
||||
|
||||
```bash
|
||||
cp templates/modelscope/README.template.md modelscope_upload/README.md
|
||||
cp templates/modelscope/configuration.template.json modelscope_upload/configuration.json
|
||||
cp templates/modelscope/.gitattributes modelscope_upload/.gitattributes
|
||||
```
|
||||
|
||||
然后把目标发布文件复制到 `modelscope_upload/`。
|
||||
|
||||
## Step 6: 上传
|
||||
|
||||
```bash
|
||||
./scripts/upload_to_modelscope.sh <repo_id> <token> modelscope_upload direct "Upload quantized GGUF"
|
||||
```
|
||||
|
||||
- `direct`:关闭代理上传
|
||||
- `proxy`:保留代理上传
|
||||
Reference in New Issue
Block a user