Workflow Template

本流程用于新模型接入，默认在仓库根目录执行。

Step 1: HF -> BF16 GGUF

使用 ik_llama.cpp 的转换脚本：

python convert_hf_to_gguf.py \
  <hf_model_dir> \
  --outtype bf16 \
  --outfile artifacts/<model_name>/base_gguf/<model_name>-bf16.gguf

Step 2: 准备校准数据

./.venv/bin/python scripts/prepare_calib_data.py --force-refresh

输出：

calibration/calibration_data_v5_rc.txt
calibration/calibration_data_v5_rc_code.txt

固定组成：1152 + 2000 + 1000 = 4152 blocks。

Step 3: 生成 imatrix

docker run --gpus all --rm \
  --entrypoint sh \
  -v <repo_root>:/workspace/models \
  -v <repo_root>/calibration/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \
  hotwa/ik:latest \
  -c "/llama-imatrix -m <bf16_gguf> -f /workspace/calib_data.txt -o <imatrix_out> --ctx-size 512 -ngl 99 --threads 16"

Step 4: 量化导出

分别执行：

docker run --gpus all --rm \
  --entrypoint sh \
  -v <repo_root>:/workspace/models \
  hotwa/ik:latest \
  -c "/llama-quantize --imatrix <imatrix_out> <bf16_gguf> <out_gguf> IQ4_KS"

将量化结果放入：artifacts/<model_name>/quantized_gguf/。

Step 5: 组织上传目录

cp templates/modelscope/README.template.md modelscope_upload/README.md
cp templates/modelscope/configuration.template.json modelscope_upload/configuration.json
cp templates/modelscope/.gitattributes modelscope_upload/.gitattributes

然后把目标发布文件复制到 modelscope_upload/。

Step 6: 上传

./scripts/upload_to_modelscope.sh <repo_id> <token> modelscope_upload direct "Upload quantized GGUF"

direct：关闭代理上传
proxy：保留代理上传

1.7 KiB Raw Blame History Unescape Escape

Workflow Template

Step 1: HF -> BF16 GGUF

Step 2: 准备校准数据

Step 3: 生成 imatrix

Step 4: 量化导出

Step 5: 组织上传目录

Step 6: 上传

1.7 KiB

Raw Blame History