first commit
This commit is contained in:
71
docs/WORKFLOW_TEMPLATE.md
Normal file
71
docs/WORKFLOW_TEMPLATE.md
Normal file
@@ -0,0 +1,71 @@
|
||||
# Workflow Template
|
||||
|
||||
本流程用于新模型接入,默认在仓库根目录执行。
|
||||
|
||||
## Step 1: HF -> BF16 GGUF
|
||||
|
||||
使用 `ik_llama.cpp` 的转换脚本:
|
||||
|
||||
```bash
|
||||
python convert_hf_to_gguf.py \
|
||||
<hf_model_dir> \
|
||||
--outtype bf16 \
|
||||
--outfile artifacts/<model_name>/base_gguf/<model_name>-bf16.gguf
|
||||
```
|
||||
|
||||
## Step 2: 准备校准数据
|
||||
|
||||
```bash
|
||||
./.venv/bin/python scripts/prepare_calib_data.py --force-refresh
|
||||
```
|
||||
|
||||
输出:
|
||||
|
||||
- `calibration/calibration_data_v5_rc.txt`
|
||||
- `calibration/calibration_data_v5_rc_code.txt`
|
||||
|
||||
固定组成:1152 + 2000 + 1000 = 4152 blocks。
|
||||
|
||||
## Step 3: 生成 imatrix
|
||||
|
||||
```bash
|
||||
docker run --gpus all --rm \
|
||||
--entrypoint sh \
|
||||
-v <repo_root>:/workspace/models \
|
||||
-v <repo_root>/calibration/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \
|
||||
hotwa/ik:latest \
|
||||
-c "/llama-imatrix -m <bf16_gguf> -f /workspace/calib_data.txt -o <imatrix_out> --ctx-size 512 -ngl 99 --threads 16"
|
||||
```
|
||||
|
||||
## Step 4: 量化导出
|
||||
|
||||
分别执行:
|
||||
|
||||
```bash
|
||||
docker run --gpus all --rm \
|
||||
--entrypoint sh \
|
||||
-v <repo_root>:/workspace/models \
|
||||
hotwa/ik:latest \
|
||||
-c "/llama-quantize --imatrix <imatrix_out> <bf16_gguf> <out_gguf> IQ4_KS"
|
||||
```
|
||||
|
||||
将量化结果放入:`artifacts/<model_name>/quantized_gguf/`。
|
||||
|
||||
## Step 5: 组织上传目录
|
||||
|
||||
```bash
|
||||
cp templates/modelscope/README.template.md modelscope_upload/README.md
|
||||
cp templates/modelscope/configuration.template.json modelscope_upload/configuration.json
|
||||
cp templates/modelscope/.gitattributes modelscope_upload/.gitattributes
|
||||
```
|
||||
|
||||
然后把目标发布文件复制到 `modelscope_upload/`。
|
||||
|
||||
## Step 6: 上传
|
||||
|
||||
```bash
|
||||
./scripts/upload_to_modelscope.sh <repo_id> <token> modelscope_upload direct "Upload quantized GGUF"
|
||||
```
|
||||
|
||||
- `direct`:关闭代理上传
|
||||
- `proxy`:保留代理上传
|
||||
Reference in New Issue
Block a user