first commit

2026-03-02 23:22:33 +08:00
parent 1c5822d16b
commit c5ae56c463
22 changed files with 606 additions and 462 deletions
--- a/docs/WORKFLOW_TEMPLATE.md
+++ b/docs/WORKFLOW_TEMPLATE.md
@@ -0,0 +1,71 @@
+# Workflow Template
+
+本流程用于新模型接入，默认在仓库根目录执行。
+
+## Step 1: HF -> BF16 GGUF
+
+使用 `ik_llama.cpp` 的转换脚本：
+
+```bash
+python convert_hf_to_gguf.py \
+  <hf_model_dir> \
+  --outtype bf16 \
+  --outfile artifacts/<model_name>/base_gguf/<model_name>-bf16.gguf
+```
+
+## Step 2: 准备校准数据
+
+```bash
+./.venv/bin/python scripts/prepare_calib_data.py --force-refresh
+```
+
+输出：
+
+- `calibration/calibration_data_v5_rc.txt`
+- `calibration/calibration_data_v5_rc_code.txt`
+
+固定组成：1152 + 2000 + 1000 = 4152 blocks。
+
+## Step 3: 生成 imatrix
+
+```bash
+docker run --gpus all --rm \
+  --entrypoint sh \
+  -v <repo_root>:/workspace/models \
+  -v <repo_root>/calibration/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \
+  hotwa/ik:latest \
+  -c "/llama-imatrix -m <bf16_gguf> -f /workspace/calib_data.txt -o <imatrix_out> --ctx-size 512 -ngl 99 --threads 16"
+```
+
+## Step 4: 量化导出
+
+分别执行：
+
+```bash
+docker run --gpus all --rm \
+  --entrypoint sh \
+  -v <repo_root>:/workspace/models \
+  hotwa/ik:latest \
+  -c "/llama-quantize --imatrix <imatrix_out> <bf16_gguf> <out_gguf> IQ4_KS"
+```
+
+将量化结果放入：`artifacts/<model_name>/quantized_gguf/`。
+
+## Step 5: 组织上传目录
+
+```bash
+cp templates/modelscope/README.template.md modelscope_upload/README.md
+cp templates/modelscope/configuration.template.json modelscope_upload/configuration.json
+cp templates/modelscope/.gitattributes modelscope_upload/.gitattributes
+```
+
+然后把目标发布文件复制到 `modelscope_upload/`。
+
+## Step 6: 上传
+
+```bash
+./scripts/upload_to_modelscope.sh <repo_id> <token> modelscope_upload direct "Upload quantized GGUF"
+```
+
+- `direct`：关闭代理上传
+- `proxy`：保留代理上传