first commit

This commit is contained in:
2026-03-02 23:22:33 +08:00
parent 1c5822d16b
commit c5ae56c463
22 changed files with 606 additions and 462 deletions

View File

@@ -1,94 +0,0 @@
# ModelScope 上传 SOP当前项目
## 1. 目录与文件
工作目录:
`/home/zly/project/modelscope_qwen35_27b_quantized`
上传目录:
`/home/zly/project/modelscope_qwen35_27b_quantized/modelscope_upload`
上传目录应包含:
- `README.md`(超过 200 字,含 `tasks``license`
- `configuration.json`
- `.gitattributes`
- `Qwen3.5-27B-IQ4_KS.gguf`
- `Qwen3.5-27B-IQ5_K.gguf`
- `Qwen3.5-27B-IQ6_K.gguf`
- `Qwen3.5-27B.imatrix.dat`
快速检查:
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
ls -lah modelscope_upload
wc -m modelscope_upload/README.md
```
## 2. 环境准备
使用本地虚拟环境:
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
./.venv/bin/python -V
./.venv/bin/modelscope --version
```
如果未安装:
```bash
./.venv/bin/pip install -U modelscope "setuptools<81"
```
## 3. 登录 ModelScope
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
./.venv/bin/modelscope login --token "<YOUR_MODELSCOPE_TOKEN>"
```
## 4. 上传(推荐:直连无代理)
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
env -u HTTP_PROXY -u HTTPS_PROXY -u ALL_PROXY -u NO_PROXY \
./.venv/bin/modelscope upload \
"jiaoyuan/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF" \
"./modelscope_upload" \
. \
--repo-type model \
--commit-message "Upload Qwen3.5-27B quantized GGUF weights"
```
## 5. 上传(如需走代理)
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
./.venv/bin/modelscope upload \
"jiaoyuan/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF" \
"./modelscope_upload" \
. \
--repo-type model \
--commit-message "Upload Qwen3.5-27B quantized GGUF weights"
```
## 6. 断点/重传说明
- 上传中断后可直接重复执行第 4 步或第 5 步命令。
- CLI 会先做 hash 校验并复用已上传分片,不需要手工删除本地文件。
## 7. 发布后检查
仓库地址:
`https://www.modelscope.cn/models/jiaoyuan/Qwen3.5-27B-Claude-Opus-4.6-Distill-GGUF`
检查点:
- 文件是否完整显示3 个 GGUF + 1 个 imatrix + 元数据)
- README 是否正确展示任务和许可
- 页面是否脱离预发布状态(若仍预发布,可补充说明后再申诉)

View File

@@ -0,0 +1,35 @@
# New Model Checklist
## 基础信息
- [ ] 模型英文名(统一用于目录命名)
- [ ] HuggingFace 源仓库
- [ ] ModelScope 目标仓库repo_id
## 数据与转换
- [ ] safetensors 下载完整
- [ ] BF16 GGUF 转换成功
- [ ] `scripts/prepare_calib_data.py` 生成混合校准数据
- [ ] 校准数据 block 数为 4152
## 量化
- [ ] imatrix 生成成功
- [ ] IQ4_KS 生成成功
- [ ] IQ5_K 生成成功
- [ ] IQ6_K 生成成功
## 发布目录
- [ ] `modelscope_upload/README.md` 超过 200 字
- [ ] README 包含 `tasks``license`
- [ ] `modelscope_upload/configuration.json` 字段完整
- [ ] `.gitattributes` 已配置 LFS
- [ ] 上传目录包含实际模型文件(不止元数据)
## 文档归档
- [ ] `examples/<model_name>/README.md` 已更新
- [ ] 关键命令与参数已写入 `examples/<model_name>/docs/`
- [ ] 模板文档同步更新(如有流程改动)

View File

@@ -1,231 +0,0 @@
# Qwen3.5-27B 量化操作手册ik_llama.cpp Docker 版)
## 1. 目标与范围
本手册用于在目录 `/home/zly/project/modelscope_qwen35_27b_quantized` 中,使用 `ik_llama.cpp` 对 Qwen3.5-27B BF16 GGUF 进行 imatrix 计算与量化,产出:
- `Qwen3.5-27B.imatrix.dat`
- `Qwen3.5-27B-IQ4_KS.gguf`
- `Qwen3.5-27B-IQ5_K.gguf`
- `Qwen3.5-27B-IQ6_K.gguf`
镜像:`hotwa/ik:latest`
核心工具:`/llama-imatrix``/llama-quantize`
---
## 2. 前置条件
- Docker 可用并有权限访问 daemon
- NVIDIA GPU 可用(推荐)
- 当前目录存在 BF16 输入文件:
- `Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf`
- 当前目录存在 Python 环境和脚本:
- `./.venv/bin/python`
- `prepare_calib_data.py`
检查命令:
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
docker run --rm --gpus all --entrypoint sh hotwa/ik:latest -c "ls -la /llama-imatrix /llama-quantize"
ls -lh Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf
```
---
## 3. 准备校准数据
### 3.1 下载基础校准文件1152 blocks 来源)
推荐(社区常用版本):
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
wget -O calibration_data_v5_rc.txt \
"https://gist.githubusercontent.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c/raw/571fda718462de863e5a0171078c175420c7649a/calibration_data_v5_rc.txt"
```
官方备用源(网络可达时):
```bash
wget -O calibration_data_v5_rc.txt \
"https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/calibration/calibration_data.txt"
```
### 3.2 生成混合校准集
脚本目标组成(严格):
- 基础数据1152 blocks`calibration_data_v5_rc.txt`
- 代码对话2000 blocks`QuixiAI/Code-74k-ShareGPT-Vicuna`
- 代码偏好1000 blocks`alvarobartt/openhermes-preferences-coding`
执行:
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
./.venv/bin/python prepare_calib_data.py --force-refresh
```
### 3.3 校验 block 数
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
./.venv/bin/python - <<'PY'
import re
from pathlib import Path
def count_blocks(path):
txt = Path(path).read_text(encoding="utf-8", errors="ignore")
return len([b for b in re.split(r"\n\s*\n", txt) if b.strip()])
print("base =", count_blocks("calibration_data_v5_rc.txt"))
print("mix =", count_blocks("calibration_data_v5_rc_code.txt"))
PY
```
期望:
- `base = 1152`
- `mix = 4152`1152 + 2000 + 1000
---
## 4. 生成 imatrix
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
docker run --gpus all --rm \
--entrypoint sh \
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
-v /home/zly/project/modelscope_qwen35_27b_quantized/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \
hotwa/ik:latest \
-c "/llama-imatrix \
-m /workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
-f /workspace/calib_data.txt \
-o /workspace/models/Qwen3.5-27B.imatrix.dat \
--ctx-size 512 \
-ngl 99 \
--threads 16"
```
完成校验:
```bash
ls -lh Qwen3.5-27B.imatrix.dat
```
---
## 5. 量化三种格式
### 5.1 IQ4_KS
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
docker run --gpus all --rm \
--entrypoint sh \
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
hotwa/ik:latest \
-c "/llama-quantize \
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
/workspace/models/Qwen3.5-27B-IQ4_KS.gguf \
IQ4_KS"
```
### 5.2 IQ5_K
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
docker run --gpus all --rm \
--entrypoint sh \
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
hotwa/ik:latest \
-c "/llama-quantize \
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
/workspace/models/Qwen3.5-27B-IQ5_K.gguf \
IQ5_K"
```
### 5.3 IQ6_K
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
docker run --gpus all --rm \
--entrypoint sh \
-v /home/zly/project/modelscope_qwen35_27b_quantized:/workspace/models \
hotwa/ik:latest \
-c "/llama-quantize \
--imatrix /workspace/models/Qwen3.5-27B.imatrix.dat \
/workspace/models/Qwen3.5-27b-Opus-4.6-Distill-BF16-00001-of-00002.gguf \
/workspace/models/Qwen3.5-27B-IQ6_K.gguf \
IQ6_K"
```
---
## 6. 一次性校验结果
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
ls -lh Qwen3.5-27B.imatrix.dat Qwen3.5-27B-IQ4_KS.gguf Qwen3.5-27B-IQ5_K.gguf Qwen3.5-27B-IQ6_K.gguf
```
本次实测2026-03-02
- `Qwen3.5-27B.imatrix.dat` = `13,582,647` bytes`12.95 MB`
- `Qwen3.5-27B-IQ4_KS.gguf` = `14,705,833,248` bytes`13.70 GB`
- `Qwen3.5-27B-IQ5_K.gguf` = `18,679,612,704` bytes`17.40 GB`
- `Qwen3.5-27B-IQ6_K.gguf` = `22,292,632,864` bytes`20.76 GB`
---
## 7. 常见问题
### 7.1 `docker.sock` 权限错误
现象:`permission denied while trying to connect to the Docker daemon socket`
处理:
- 使用具备 Docker 权限的用户执行
- 或检查 `docker` 用户组配置
### 7.2 下载源 DNS 失败
现象:`unable to resolve host address`
处理:
- 优先使用 gist 源(见 3.1
- 或配置可用代理后重试
### 7.3 输出文件属主为 `root`
容器写文件可能生成 root 属主。按需修正:
```bash
cd /home/zly/project/modelscope_qwen35_27b_quantized
sudo chown -R $(id -u):$(id -g) Qwen3.5-27B*.gguf Qwen3.5-27B.imatrix.dat
```
---
## 8. ModelScope 发布最小清单(可选)
- `README.md`>= 200 字,包含任务和许可信息)
- `configuration.json`(包含 `framework``task``model.type`
- `.gitattributes``*.gguf` 走 LFS
- 量化文件:
- `Qwen3.5-27B-IQ4_KS.gguf`
- `Qwen3.5-27B-IQ5_K.gguf`
- `Qwen3.5-27B-IQ6_K.gguf`

31
docs/REPO_STRUCTURE.md Normal file
View File

@@ -0,0 +1,31 @@
# Repository Structure
## Top-level
- `docs/`:通用流程与检查清单
- `scripts/`:准备数据、上传等脚本
- `templates/`ModelScope 元数据模板
- `examples/`:历史模型案例
- `calibration/`:校准数据与缓存
- `modelscope_upload/`:当前上传工作区
- `artifacts/`:本地大产物(默认忽略)
## Recommended Layout
```text
artifacts/
<model_name>/
base_gguf/
quantized_gguf/
examples/
<model_name>/
README.md
docs/
modelscope_upload/
artifacts/
```
## Notes
- `modelscope_upload/` 只保留当前发布批次所需文件。
- 案例目录用于记录“参数与过程”,避免知识丢失。

71
docs/WORKFLOW_TEMPLATE.md Normal file
View File

@@ -0,0 +1,71 @@
# Workflow Template
本流程用于新模型接入,默认在仓库根目录执行。
## Step 1: HF -> BF16 GGUF
使用 `ik_llama.cpp` 的转换脚本:
```bash
python convert_hf_to_gguf.py \
<hf_model_dir> \
--outtype bf16 \
--outfile artifacts/<model_name>/base_gguf/<model_name>-bf16.gguf
```
## Step 2: 准备校准数据
```bash
./.venv/bin/python scripts/prepare_calib_data.py --force-refresh
```
输出:
- `calibration/calibration_data_v5_rc.txt`
- `calibration/calibration_data_v5_rc_code.txt`
固定组成1152 + 2000 + 1000 = 4152 blocks。
## Step 3: 生成 imatrix
```bash
docker run --gpus all --rm \
--entrypoint sh \
-v <repo_root>:/workspace/models \
-v <repo_root>/calibration/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \
hotwa/ik:latest \
-c "/llama-imatrix -m <bf16_gguf> -f /workspace/calib_data.txt -o <imatrix_out> --ctx-size 512 -ngl 99 --threads 16"
```
## Step 4: 量化导出
分别执行:
```bash
docker run --gpus all --rm \
--entrypoint sh \
-v <repo_root>:/workspace/models \
hotwa/ik:latest \
-c "/llama-quantize --imatrix <imatrix_out> <bf16_gguf> <out_gguf> IQ4_KS"
```
将量化结果放入:`artifacts/<model_name>/quantized_gguf/`
## Step 5: 组织上传目录
```bash
cp templates/modelscope/README.template.md modelscope_upload/README.md
cp templates/modelscope/configuration.template.json modelscope_upload/configuration.json
cp templates/modelscope/.gitattributes modelscope_upload/.gitattributes
```
然后把目标发布文件复制到 `modelscope_upload/`
## Step 6: 上传
```bash
./scripts/upload_to_modelscope.sh <repo_id> <token> modelscope_upload direct "Upload quantized GGUF"
```
- `direct`:关闭代理上传
- `proxy`:保留代理上传