106 lines
2.6 KiB
Markdown
106 lines
2.6 KiB
Markdown
# AGENTS
|
||
|
||
## Purpose
|
||
|
||
本仓库用于复用 LLM 量化发布流程:HF safetensors -> BF16 GGUF -> imatrix -> IQ4/IQ5/IQ6 -> ModelScope 上传目录。
|
||
|
||
## Repository Contract
|
||
|
||
- 模板文档:`docs/`
|
||
- 脚本:`scripts/`
|
||
- 模板文件:`templates/`
|
||
- 案例:`examples/<model_name>/`
|
||
- 上传工作区:`modelscope_upload/`
|
||
- 大产物:`artifacts/`(忽略)
|
||
|
||
## Hard Rules
|
||
|
||
1. 禁止提交任何权重文件(`*.gguf`, `*.safetensors`, `*.bin`, `*.pt` 等)。
|
||
2. 禁止提交 token、密钥、账号凭据。
|
||
3. 新模型必须新增 `examples/<model_name>/README.md` 记录关键参数。
|
||
4. 任何脚本或流程变更,必须同步更新 `docs/`。
|
||
|
||
## Standard Quantization Skill
|
||
|
||
### 0) Prerequisites
|
||
|
||
- Python venv: `./.venv`
|
||
- Docker + GPU(推荐)
|
||
- 可用的 HF 模型目录(safetensors)
|
||
- `hotwa/ik:latest` 可拉取
|
||
|
||
### 1) HF -> BF16 GGUF
|
||
|
||
在 `ik_llama.cpp` 中执行:
|
||
|
||
```bash
|
||
python convert_hf_to_gguf.py \
|
||
<hf_model_dir> \
|
||
--outtype bf16 \
|
||
--outfile <output_bf16_gguf>
|
||
```
|
||
|
||
将 BF16 GGUF 放入 `artifacts/<model_name>/base_gguf/`。
|
||
|
||
### 2) Build Calibration Dataset
|
||
|
||
执行:
|
||
|
||
```bash
|
||
./.venv/bin/python scripts/prepare_calib_data.py --force-refresh
|
||
```
|
||
|
||
目标输出:`calibration/calibration_data_v5_rc_code.txt`,严格组成:
|
||
|
||
- 1152 blocks: `calibration_data_v5_rc.txt`
|
||
- 2000 blocks: `QuixiAI/Code-74k-ShareGPT-Vicuna`
|
||
- 1000 blocks: `alvarobartt/openhermes-preferences-coding`
|
||
|
||
### 3) Generate imatrix
|
||
|
||
```bash
|
||
docker run --gpus all --rm \
|
||
--entrypoint sh \
|
||
-v <repo_root>:/workspace/models \
|
||
-v <repo_root>/calibration/calibration_data_v5_rc_code.txt:/workspace/calib_data.txt \
|
||
hotwa/ik:latest \
|
||
-c "/llama-imatrix -m <bf16_gguf> -f /workspace/calib_data.txt -o <imatrix_out> --ctx-size 512 -ngl 99 --threads 16"
|
||
```
|
||
|
||
### 4) Quantize
|
||
|
||
分别导出 `IQ4_KS`, `IQ5_K`, `IQ6_K`:
|
||
|
||
```bash
|
||
docker run --gpus all --rm \
|
||
--entrypoint sh \
|
||
-v <repo_root>:/workspace/models \
|
||
hotwa/ik:latest \
|
||
-c "/llama-quantize --imatrix <imatrix_out> <bf16_gguf> <out_gguf> IQ4_KS"
|
||
```
|
||
|
||
### 5) Prepare ModelScope Folder
|
||
|
||
- 复制 `templates/modelscope/*` 到 `modelscope_upload/`
|
||
- 填写 `README.md` 与 `configuration.json`
|
||
- 放入量化产物(GGUF + imatrix)
|
||
|
||
### 6) Upload
|
||
|
||
使用脚本:
|
||
|
||
```bash
|
||
./scripts/upload_to_modelscope.sh <repo_id> <token> modelscope_upload direct "Upload quantized GGUF"
|
||
```
|
||
|
||
`direct` 会自动关闭代理;`proxy` 保留代理。
|
||
|
||
## Definition Of Done
|
||
|
||
- BF16 GGUF 存在
|
||
- `imatrix.dat` 存在
|
||
- IQ4/IQ5/IQ6 均存在
|
||
- `modelscope_upload/README.md` > 200 字且含 `tasks`、`license`
|
||
- `modelscope_upload/configuration.json` 字段完整
|
||
- `examples/<model_name>/` 已补充记录
|