Files
llm-gguf-quant-template/AGENTS.md
2026-03-02 23:22:33 +08:00

106 lines
2.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# AGENTS
## Purpose
本仓库用于复用 LLM 量化发布流程HF safetensors -> BF16 GGUF -> imatrix -> IQ4/IQ5/IQ6 -> ModelScope 上传目录。
## Repository Contract
- 模板文档:`docs/`
- 脚本:`scripts/`
- 模板文件:`templates/`
- 案例:`examples/<model_name>/`
- 上传工作区:`modelscope_upload/`
- 大产物:`artifacts/`(忽略)
## Hard Rules
1. 禁止提交任何权重文件(`*.gguf`, `*.safetensors`, `*.bin`, `*.pt` 等)。
2. 禁止提交 token、密钥、账号凭据。
3. 新模型必须新增 `examples/<model_name>/README.md` 记录关键参数。
4. 任何脚本或流程变更,必须同步更新 `docs/`
## Standard Quantization Skill
### 0) Prerequisites
- Python venv: `./.venv`
- Docker + GPU推荐
- 可用的 HF 模型目录safetensors
- `hotwa/ik:latest` 可拉取
### 1) HF -> BF16 GGUF
`ik_llama.cpp` 中执行:
```bash
python convert_hf_to_gguf.py \
<hf_model_dir> \
--outtype bf16 \
--outfile <output_bf16_gguf>
```
将 BF16 GGUF 放入 `artifacts/<model_name>/base_gguf/`
### 2) Build Calibration Dataset
执行:
```bash
./.venv/bin/python scripts/prepare_calib_data.py --force-refresh
```
目标输出:`calibration/calibration_data_v5_rc_code.txt`,严格组成:
- 1152 blocks: `calibration_data_v5_rc.txt`
- 2000 blocks: `QuixiAI/Code-74k-ShareGPT-Vicuna`
- 1000 blocks: `alvarobartt/openhermes-preferences-coding`
### 3) Generate imatrix
```bash
docker run --gpus all --rm \
--entrypoint sh \
-v <repo_root>:/workspace/models \
-v <repo_root>/calibration/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \
hotwa/ik:latest \
-c "/llama-imatrix -m <bf16_gguf> -f /workspace/calib_data.txt -o <imatrix_out> --ctx-size 512 -ngl 99 --threads 16"
```
### 4) Quantize
分别导出 `IQ4_KS`, `IQ5_K`, `IQ6_K`
```bash
docker run --gpus all --rm \
--entrypoint sh \
-v <repo_root>:/workspace/models \
hotwa/ik:latest \
-c "/llama-quantize --imatrix <imatrix_out> <bf16_gguf> <out_gguf> IQ4_KS"
```
### 5) Prepare ModelScope Folder
- 复制 `templates/modelscope/*``modelscope_upload/`
- 填写 `README.md``configuration.json`
- 放入量化产物GGUF + imatrix
### 6) Upload
使用脚本:
```bash
./scripts/upload_to_modelscope.sh <repo_id> <token> modelscope_upload direct "Upload quantized GGUF"
```
`direct` 会自动关闭代理;`proxy` 保留代理。
## Definition Of Done
- BF16 GGUF 存在
- `imatrix.dat` 存在
- IQ4/IQ5/IQ6 均存在
- `modelscope_upload/README.md` > 200 字且含 `tasks``license`
- `modelscope_upload/configuration.json` 字段完整
- `examples/<model_name>/` 已补充记录