210 lines
6.7 KiB
Markdown
210 lines
6.7 KiB
Markdown
## 模型权重准备
|
||
|
||
Metal后端需要特定格式的权重文件。你有两个选择:
|
||
|
||
### 转换现有权重:
|
||
|
||
```bash
|
||
python gpt_oss/metal/scripts/create-local-model.py -s <model_dir> -d <output_file>
|
||
```
|
||
|
||
### 下载预转换权重:
|
||
|
||
```bash
|
||
huggingface-cli download openai/gpt-oss-120b --include "metal/*" --local-dir gpt-oss-120b/metal/
|
||
huggingface-cli download openai/gpt-oss-20b --include "metal/*" --local-dir gpt-oss-20b/metal/
|
||
```
|
||
|
||
这里的"Metal版本"指的是GPT-OSS模型的Metal后端实现。
|
||
|
||
|
||
## 环境准备
|
||
|
||
macOS系统(Apple Silicon)
|
||
|
||
1. 准备环境
|
||
|
||
```bash
|
||
# 安装Xcode并完成初始化验证
|
||
xcode-select --install
|
||
xcrun -find metal || echo "metal not found"
|
||
sudo xcode-select -s /Applications/Xcode.app/Contents/Developer
|
||
# 打开 xcode 安装 macos sdk,命令行安装不成功在图形化界面安装即可
|
||
sudo xcodebuild -license accept
|
||
xcodebuild -runFirstLaunch
|
||
# 安装着色器的工具链
|
||
sudo xcode-select -s /Applications/Xcode.app/Contents/Developer
|
||
xcodebuild -downloadComponent MetalToolchain
|
||
# 验证是否安装成功
|
||
xcrun --sdk macosx --find metal
|
||
xcrun --sdk macosx --show-sdk-path
|
||
# 创建虚拟环境
|
||
micromamba create -n gptoss python=3.12 -y
|
||
micromamba activate gptoss
|
||
micromamba install pybind11 -c conda-forge -y
|
||
```
|
||
|
||
2. 手动运行CMake构建
|
||
|
||
```bash
|
||
git clone https://github.com/hotwa/openharmony-mlx.git
|
||
cd openharmony-mlx
|
||
|
||
# 自动安装cmake安装
|
||
GPTOSS_BUILD_METAL=1 pip install -e ".[metal]"
|
||
# 手动编译cmake安装
|
||
cd gpt_oss/metal
|
||
mkdir build
|
||
cd build
|
||
cmake .. -DCMAKE_BUILD_TYPE=Release -DGPTOSS_BUILD_PYTHON=ON
|
||
export pybind11_DIR=$(python -c "import pybind11; print(pybind11.get_cmake_dir())")
|
||
cmake -S .. -B . \
|
||
-DCMAKE_BUILD_TYPE=Release \
|
||
-DGPTOSS_BUILD_PYTHON=ON \
|
||
-DPYBIND11_FINDPYTHON=ON \
|
||
-Dpybind11_DIR="$(python -c 'import pybind11;print(pybind11.get_cmake_dir())')"
|
||
cmake --build . --config Release --parallel
|
||
make -j$(nproc)
|
||
ctest --output-on-failure
|
||
```
|
||
|
||
3. Metal着色器编译
|
||
CMake会自动编译Metal源文件: CMakeLists.txt:16-28
|
||
|
||
这些.metal文件会被编译成.air中间文件,然后链接成default.metallib:
|
||
|
||
4. Python扩展模块构建
|
||
CMake会创建名为_metal的Python扩展模块:
|
||
|
||
## python 安装包
|
||
|
||
```bash
|
||
# 手动安装
|
||
# 安装扩展模块
|
||
cp _metal.so /path/to/your/python/site-packages/gpt_oss/metal/
|
||
# 安装Metal库文件
|
||
cp default.metallib /path/to/your/python/site-packages/gpt_oss/metal/
|
||
```
|
||
|
||
```bash
|
||
# 在 gpt_oss 仓库根目录(不是 metal/build)
|
||
cd /path/to/gpt_oss
|
||
|
||
# 确保环境中 pybind11、Xcode 都就绪
|
||
export GPTOSS_BUILD_METAL=1
|
||
python -m pip install -e ".[metal]" # 开发模式安装(可改代码即时生效)
|
||
# 或者正式安装
|
||
# python -m pip install ".[metal]"
|
||
```
|
||
|
||
5. 验证metal模块是否正确安装
|
||
|
||
```python
|
||
python -c "import gpt_oss.metal._metal; print('Metal module loaded successfully')"
|
||
```
|
||
|
||
## 启动服务
|
||
|
||
缓存下载并启动服务
|
||
|
||
```bash
|
||
mkdir -p ~/.cache/openai_harmony/
|
||
cd ~/.cache/openai_harmony/
|
||
wget https://openaipublic.blob.core.windows.net/encodings/o200k_base.tiktoken
|
||
export OPENAI_HARMONY_CACHE_DIR=~/.cache/openai_harmony/
|
||
chmod 755 ~/.cache/openai_harmony/
|
||
python /Volumes/long990max/project/openharmony-mlx/gpt_oss/metal/scripts/create-local-modelnew.py -s /Volumes/long990max/gpustack_data/huihui-ai/Huihui-gpt-oss-20b-mxfp4-abliterated -d /Volumes/long990max/project/openharmony-mlx/model.bin
|
||
|
||
micromamba activate gptoss && python -m gpt_oss.responses_api.serve --inference-backend metal --host 0.0.0.0 --port 8080 --checkpoint /Volumes/long990max/gpustack_data/openai/gpt-oss-20b/metal/model.bin
|
||
# 启动拒绝少的权重
|
||
micromamba activate gptoss && python -m gpt_oss.responses_api.serve \
|
||
--inference-backend metal \
|
||
--checkpoint /Volumes/long990max/project/openharmony-mlx/pth/gpt-oss-20b-uncensored-mxfp4/metal/model.bin \
|
||
--host 0.0.0.0 --port 8080
|
||
```
|
||
|
||
## gpt-oss-120b 模型选择
|
||
|
||
huizimao/gpt-oss-120b-uncensored-bf16(LoRA,BF16)
|
||
在 Amazon FalseReject 测试集(300条)上的 误拒率≈6%(原版≈70%)。适合你追求最低误拒、且硬件吃得下 BF16 的场景。
|
||
|
||
huizimao/gpt-oss-120b-uncensored-mxfp4(LoRA + PTQ,MXFP4)
|
||
同一评测设置下,误拒率≈24%;相比 BF16 版本误拒稍高,但体积/部署友好,便于与你现在的 Metal/MXFP4 流水线对接。
|
||
|
||
## 以后要转其它 finetune 的 safetensors(同 20B)时,提前确认这几件事(最小清单)
|
||
|
||
config.json 至少含(或能推导):
|
||
|
||
num_hidden_layers, hidden_size, intermediate_size
|
||
|
||
num_attention_heads, num_key_value_heads, head_dim(若 head_dim != 64 就不能 bake Q/K 缩放)
|
||
|
||
sliding_window, rope_theta, initial_context_length
|
||
|
||
rope_scaling_factor 或 rope_scaling.factor;rope_ntk_alpha/beta(给了默认 1.0/32.0)
|
||
|
||
MoE:num_experts 或 num_local_experts;num_active_experts/experts_per_token(默认 4)
|
||
|
||
权重命名是否落在这两类之一(脚本已兼容):
|
||
|
||
原生:block.N.attn.qkv.* 或 q_proj/k_proj/v_proj.*;mlp.mlp{1,2}_weight.{blocks,scales} + mlp{1,2}_bias
|
||
|
||
Jinx:model.layers.N.self_attn.{q,k,v}_proj_*;mlp.experts.{gate_up,down}_proj_{blocks,scales,bias}
|
||
|
||
特殊 Token:必须与 Harmony GPT-OSS 的映射一致(我脚本里写死了)。常见问题是有人把
|
||
"<|endofuntrusted|>" 拼成 end_untrusted —— 这会导致 UUID 表错位;我这份脚本固定了官方拼写。
|
||
|
||
MXFP4 scales:随手跑一眼(我给你的 peek_scales_v2.py 就行)。如果 max + 14 >= 256,脚本的 clamp 就会生效,避免坏值写入。
|
||
|
||
快速自测:除了上面的 curl SSE,建议每次都跑一下:
|
||
|
||
tests/token_uuid_slot.py 对 <|channel|> / <|message|> / <|return|> / <|call|> 看 slot 与 UUID 是否匹配;
|
||
|
||
tests/smoke_metal.py 推几个 token,确认不会崩溃或卡死。
|
||
|
||
以上都满足,基本可以保证在 Codex 的 responses 线上协议里稳定工作。
|
||
|
||
## cherrystudio 配置
|
||
|
||
添加提供商选择`OpenAI-Response`
|
||
|
||
添加参数如下
|
||
|
||
模型ID:gpt-oss-120b
|
||
模型名称:gpt-oss-120b
|
||
分组名称:gpt-oss
|
||
|
||
API 地址:http://localhost:8080
|
||
密钥:无
|
||
|
||
请求虽然是gpt-oss-120b,但是实际使用的是gpt-oss-20b。由于后台写死的是120b,所以请求使用gpt-oss-120b
|
||
|
||
## codex 使用
|
||
|
||
```bash
|
||
vim .codex/config.toml
|
||
```
|
||
|
||
```toml
|
||
disable_response_storage = true
|
||
show_reasoning_content = true
|
||
model = "gpt-5-codex"
|
||
|
||
[model_providers.local]
|
||
name = "local"
|
||
base_url = "http://100.64.0.4:8080/v1"
|
||
wire_api = "responses"
|
||
include_apply_patch_tool = false
|
||
|
||
[profiles.oss]
|
||
model = "gpt-oss-120b"
|
||
model_provider = "local"
|
||
include_apply_patch_tool = false
|
||
|
||
[mcp_servers.web-mcp]
|
||
url = "https://web-mcp.koyeb.app/sse/04824d01-60c3-4f20-9340-65b60d3e8344"
|
||
# 如果需要认证,可以添加 bearer_token
|
||
# bearer_token = "your-token-here"
|
||
startup_timeout_sec = 60
|
||
tool_timeout_sec = 120
|
||
``` |