Files
llm-gguf-quant-template/docs/WORKFLOW_TEMPLATE.md
2026-03-02 23:22:33 +08:00

72 lines
1.7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Workflow Template
本流程用于新模型接入,默认在仓库根目录执行。
## Step 1: HF -> BF16 GGUF
使用 `ik_llama.cpp` 的转换脚本:
```bash
python convert_hf_to_gguf.py \
<hf_model_dir> \
--outtype bf16 \
--outfile artifacts/<model_name>/base_gguf/<model_name>-bf16.gguf
```
## Step 2: 准备校准数据
```bash
./.venv/bin/python scripts/prepare_calib_data.py --force-refresh
```
输出:
- `calibration/calibration_data_v5_rc.txt`
- `calibration/calibration_data_v5_rc_code.txt`
固定组成1152 + 2000 + 1000 = 4152 blocks。
## Step 3: 生成 imatrix
```bash
docker run --gpus all --rm \
--entrypoint sh \
-v <repo_root>:/workspace/models \
-v <repo_root>/calibration/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \
hotwa/ik:latest \
-c "/llama-imatrix -m <bf16_gguf> -f /workspace/calib_data.txt -o <imatrix_out> --ctx-size 512 -ngl 99 --threads 16"
```
## Step 4: 量化导出
分别执行:
```bash
docker run --gpus all --rm \
--entrypoint sh \
-v <repo_root>:/workspace/models \
hotwa/ik:latest \
-c "/llama-quantize --imatrix <imatrix_out> <bf16_gguf> <out_gguf> IQ4_KS"
```
将量化结果放入:`artifacts/<model_name>/quantized_gguf/`
## Step 5: 组织上传目录
```bash
cp templates/modelscope/README.template.md modelscope_upload/README.md
cp templates/modelscope/configuration.template.json modelscope_upload/configuration.json
cp templates/modelscope/.gitattributes modelscope_upload/.gitattributes
```
然后把目标发布文件复制到 `modelscope_upload/`
## Step 6: 上传
```bash
./scripts/upload_to_modelscope.sh <repo_id> <token> modelscope_upload direct "Upload quantized GGUF"
```
- `direct`:关闭代理上传
- `proxy`:保留代理上传