Files

2025-09-01 12:01:01 +08:00

13 KiB

Raw Blame History

Agent

自建基于 llama-box 启动 Qwen3-Coder(-Flash) 的 AI Agent 代码仓库。

本仓库新增 langgraph_qwen 轻量适配层，帮助你在 LangGraph 中直接接入 Qwen3（含工具调用、流式增量、OpenAI 兼容接口）。并提供 MCP（Model Context Protocol）工具注入示例。

特性一览

LangGraph 适配：
- get_qwen_chat_model：创建 LangChain ChatModel（优先 OpenAI 兼容路径，失败回退 DashScope）。
- bind_qwen_tools：将 LangChain 工具绑定到模型，支持 tool_choice。
- 预置示例：examples/qwen_langgraph_react.py、examples/qwen_langgraph_stream.py、examples/qwen_langgraph_custom_model.py。
自定义 ChatModel（强烈推荐）：
- langgraph_qwen.ChatQwenOpenAICompat：
  - 直接对接 OpenAI 兼容 /v1/chat/completions（可指向 DashScope 或自建网关/llama-box）。
  - 非流式：完整解析 tool_calls，装入 AIMessage.tool_calls，可被 LangGraph ToolNode 正确消费。
  - 流式：实现 SSE 增量组装器，安全缓冲 delta.tool_calls[].function.name/arguments 的碎片，在 [DONE] 时产出完整 tool_calls；文本 token 即时输出。
  - 工具注入：.bind_tools([...]).bind(tool_choice="auto")；extra_body 透传服务端定制参数。
  - JSON Schema 兼容性增强：内部使用 convert_to_openai_tool + 自定义归一化，避免网关对工具 schema 的严格校验导致 4xx/5xx。
工具名称校验（默认启用）：
- 仅允许 [a-zA-Z0-9_-]、长度 ≤ 64；违规会报错并给出修复建议。
- 公共函数：langgraph_qwen.validators.validate_tool_names / sanitize_tool_name。
MCP 工具注入：
- 通过官方库 langchain-mcp-adapters 获取 MCP 服务器工具，并注入 LangGraph。
- 示例：examples/mcp_adapters/inject_to_langgraph.py，包含 streamable_http 快速演示服务。

待完成

阿里云百炼 Key 的测试兼容
FASTMCP 兼容适配（MCPHUB适配），server目录下的已经测试案例适配

快速开始

1) 安装

uv pip install -U langgraph langchain httpx
# 可选：
uv pip install -U langchain-openai         # 如需走 ChatOpenAI 生态
uv pip install -U '.[tongyi]'              # 如需走 ChatTongyi（DashScope SDK）
uv pip install -U '.[mcp-adapters]'        # 注入 MCP 工具
uv pip install -U '.[viz]'                 # 导出 Graph PNG 可视化
uv pip install -U '.[all]'                 # 一次装全（包含上面所有可选）

项目已组织为可安装包，建议在仓库根目录执行：
uv pip install -e '.[openai,custom]'
# 或：uv pip install -e '.[openai]'
# 或：uv pip install -e '.[custom]'

2) 环境变量（必看）

适配器会自动读取 .env（已内置加载），也可直接 export。同名变量的优先级按示例说明。

核心

QWEN_BASE_URL：OpenAI 兼容基地址，如 https://host/v1；也可填完整端点 .../v1/chat/completions。
QWEN_MODEL：模型名，如 qwen3-coder-flash-1M（以后端服务可用名为准）。
QWEN_API_KEY：客户端访问网关的 Key（若网关启用鉴权必填）。

兼容型 API Key（择一）

QWEN_API_KEY（优先）
GPUSTACK_API_KEY（自建 gpustack 网关时可用）
OPENAI_API_KEY（OpenAI 风格兼容）
DASHSCOPE_API_KEY（阿里云百炼 Key）

鉴权/网络/调试

QWEN_AUTH_HEADER：默认 Authorization；改为 X-API-Key 可适配自定义头。
QWEN_AUTH_SCHEME：默认 Bearer；若网关只要裸 Key，设为空字符串 ''。
QWEN_TIMEOUT：请求超时秒数，默认 60，建议长上下文设大如 180。
QWEN_HTTP_TRUST_ENV：是否继承系统代理，默认 1；设 0 可禁用代理环境变量。
QWEN_DEBUG：1 打印方法/URL/超时/代理选项。
QWEN_DEBUG_BODY：1 打印请求 JSON（自动打码密钥）。
QWEN_DEBUG_RESP：1 打印响应/错误体。

MCP（示例工具用）

WEATHER_MCP_URL：MCP 工具服务地址，如 http://127.0.0.1:8010/mcp。
WEATHER_TRANSPORT：streamable_http（HTTP 流）或 stdio。

3) 最小示例（ReAct + 工具调用）

export QWEN_BASE_URL='https://ai.jmsu.top/v1'
export QWEN_MODEL='qwen3-coder-flash-1M'
# 若你的网关启用了 Bearer 鉴权：export QWEN_API_KEY='your_token'
# 推荐调试：export QWEN_HTTP_TRUST_ENV=0 QWEN_DEBUG=1 QWEN_DEBUG_BODY=1 QWEN_DEBUG_RESP=1

python examples/qwen_langgraph_react.py

4) 流式示例（SSE 增量）

python examples/qwen_langgraph_stream.py

实时输出 token；在 [DONE] 时产出完整 tool_calls。

5) 自定义模型示例

python examples/qwen_langgraph_custom_model.py

使用 MCP（FastMCP，HTTP streamable）注入工具到 LangGraph

通过官方库 langchain-mcp-adapters，可以把任意 MCP 服务器暴露的工具注入到 LangGraph 的 ReAct 代理里使用。我们提供了开箱示例与一份更工程化的封装建议。

安装依赖

uv pip install -e '.[mcp-adapters]'     # 本仓库 extras，等价安装 langchain-mcp-adapters
# 最少还需要 fastapi/uvicorn（若用我们内置的演示服务）
uv pip install fastapi uvicorn

若你只需要客户端能力（连接已有 MCP 服务器），安装 langchain-mcp-adapters 即可；若要本地起一个演示工具服务，再装 fastapi/uvicorn。

启动测试mcp服务器

python examples/mcp_adapters/server/weather_server.py

该服务注册了一个异步工具：

get_weather(location: str) -> str：返回“某地晴”的示例文案

想换你自己的工具？改这个文件里用 @mcp.tool() 注册即可。

执行测试案例

# 强烈建议避免本地回环被代理： NO_PROXY=localhost,127.0.0.1
export NO_PROXY=localhost,127.0.0.1,127.0.0.1:8000

# 指向你刚刚起的 MCP 服务地址（注意尾斜杠）
export WEATHER_MCP_URL='http://127.0.0.1:8000/mcp'
export WEATHER_TRANSPORT='streamable_http'

# Qwen（OpenAI兼容）后端必需
export QWEN_BASE_URL='https://your-gateway-or-llamabox-host/v1'  # 或完整 /v1/chat/completions
export QWEN_MODEL='qwen3-coder-flash-1M'
# 若你的后端需要鉴权，额外配置：
# export QWEN_API_KEY='...'                 # 或 OPENAI_API_KEY / DASHSCOPE_API_KEY
# export QWEN_AUTH_HEADER='Authorization'   # 默认即可
# export QWEN_AUTH_SCHEME='Bearer'          # 裸 key 时置空: ''

# 调试（可选）
export QWEN_DEBUG=1
export QWEN_DEBUG_BODY=1
export QWEN_DEBUG_RESP=1
python examples/mcp_adapters/inject_to_langgraph.py

预期输出：

首行打印 Discovered tools: get_weather

模型首轮会“列出工具并选择一个调用”，随后 LangGraph ToolNode 会执行 MCP 工具，并给出中文总结

为什么 URL 需要尾斜杠？我们演示服务注册的是 POST /mcp/，不少 HTTP 路由器对 /mcp 与 /mcp/ 区分严格，建议总是使用尾斜杠或在服务端做 301/307 兼容。

封装使用

新建 langgraph_qwen/mcp.py, langgraph_qwen/__main__.py

# 方式一：用环境变量
export WEATHER_MCP_URL='http://127.0.0.1:8000/mcp/'
qwen-mcp-agent --prompt '在北京查天气并总结'

# 方式二：传入多服务器 JSON
qwen-mcp-agent --servers-json '{
  "weather":{"url":"http://127.0.0.1:8000/mcp/","transport":"streamable_http"},
  "calc":{"url":"http://127.0.0.1:8011/mcp/","transport":"streamable_http"}
}'

自定义 ChatModel：`ChatQwenOpenAICompat`

最小用法

from langgraph_qwen import ChatQwenOpenAICompat
from langgraph.prebuilt import create_react_agent
from langchain_core.messages import HumanMessage
from langchain.tools import tool

@tool
def ping(_: str = "") -> str:
    return "pong"

model = ChatQwenOpenAICompat(temperature=0).bind_tools([ping]).bind(tool_choice="auto")
agent = create_react_agent(model, [ping])
res = agent.invoke({"messages": [HumanMessage(content="调用工具 ping 并返回结果。")]})
print(res["messages"][-1].content)

为什么用它

避免第三方封装差异，统一工具/流式行为。
可靠的工具 schema 归一化（convert_to_openai_tool + 兜底修复）。
自定义鉴权头/鉴权前缀、超时、是否继承系统代理等。

MCP：将远程工具注入 LangGraph

启动一个最小 `streamable_http` MCP 服务

示例在 examples/mcp_adapters/，你也可以用 FastAPI/fastmcp 写个简易 weather 工具。

注入并调用

uv pip install -U '.[mcp-adapters]'

export WEATHER_MCP_URL='http://127.0.0.1:8010/mcp'
export WEATHER_TRANSPORT='streamable_http'

export QWEN_BASE_URL='https://ai.jmsu.top/v1'
export QWEN_MODEL='qwen3-coder-flash-1M'
# export QWEN_API_KEY='your_token'   # 如网关要求鉴权

export QWEN_DEBUG=1
export QWEN_DEBUG_BODY=1
export QWEN_DEBUG_RESP=1

python examples/mcp_adapters/inject_to_langgraph.py

若看到 5xx，多半是网关的模板/工具字段解析不兼容。适配器已做 schema 瘦身与修正；仍有问题可先把 tool_choice="none" 做两阶段调用（先思考，后仅注入目标工具）。

与网关/llama-box 的部署建议

直连 llama-box：将 QWEN_BASE_URL 指向外部可达的 OpenAI 兼容端（如 Caddy 反代到 llama-box）。
Caddy 鉴权（推荐）：
- 在 /v1* 反代前匹配 Authorization: Bearer {env.API_TOKEN}；未命中返回 401。
- 反代配置应开启：flush_interval -1、transport http { versions 1.1; keepalive 0; }、移除 Accept-Encoding，避免 SSE 被缓冲或压缩。
- 客户端侧填 QWEN_API_KEY=$API_TOKEN。如用自定义头，设置 QWEN_AUTH_HEADER/QWEN_AUTH_SCHEME。
避免多级模板改写：某些中间层会对 tools 进行模板渲染或字段改写，导致 5xx；直连或使用“透传”配置最稳。

常见问题（FAQ）

1) `StructuredTool does not support sync invocation.`

使用 agent.ainvoke(...) 或用 LangGraph 的异步入口；确保工具函数是异步或由运行时在工具节点异步执行。

2) 401 / `InvalidApiKey`

网关启用鉴权但客户端未传：设置 QWEN_API_KEY。
使用了自定义头：配置 QWEN_AUTH_HEADER 和（必要时）置空 QWEN_AUTH_SCHEME。

3) 502 / 500（带 tools 时）

工具 schema 不被网关接受：使用本适配器（已做 JSON Schema 归一化），或先将 tool_choice='none' 做两阶段调用。
上游超时：调大 QWEN_TIMEOUT，检查上游模型负载。
中间层模板渲染异常：改为直连 llama-box 或关闭模板改写。

4) 请求莫名走系统代理

设置 QWEN_HTTP_TRUST_ENV=0；或在命令前写 HTTPS_PROXY= HTTP_PROXY=。

5) 流式输出与工具参数碎片

由 ChatQwenOpenAICompat 负责缓冲与合并，无需额外处理；最终分块含完整 tool_calls。

项目结构

.
├─ langgraph_qwen/
│  ├─ __init__.py
│  ├─ factory.py
│  ├─ chat_model.py
│  ├─ utils.py
│  └─ validators.py
├─ examples/
│  ├─ qwen_langgraph_react.py
│  ├─ qwen_langgraph_stream.py
│  ├─ qwen_langgraph_custom_model.py
│  ├─ mcp_adapters/
│  └─ stream_modes/
├─ server/                      #（可选）演示用 weather MCP 服务
│  └─ weather_server.py
├─ pyproject.toml
├─ README.md
└─ qwen3_coder_with_qwen_agent.py

`curl` 快速探针

A：带工具 + tool_choice:auto

curl -i 'https://<host>/v1/chat/completions' \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $QWEN_API_KEY" \# 如无鉴权可省略
  -d '{
    "model":"qwen3-coder-flash-1M",
    "messages":[{"role":"user","content":"纽约天气"}],
    "tools":[{"type":"function","function":{
      "name":"get_weather","description":"Get weather",
      "parameters":{"type":"object","properties":{"location":{"type":"string"}},"required":["location"]}
    }}],
    "tool_choice":"auto"
  }'

B：带工具 + tool_choice:none（只思考不调用）

curl -i 'https://<host>/v1/chat/completions' \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $QWEN_API_KEY" \
  -d '{
    "model":"qwen3-coder-flash-1M",
    "messages":[{"role":"user","content":"纽约天气"}],
    "tools":[{"type":"function","function":{
      "name":"get_weather","parameters":{"type":"object","properties":{"location":{"type":"string"}}}
    }}],
    "tool_choice":"none"
  }'

C：最小化（无工具）

curl -i 'https://<host>/v1/chat/completions' \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $QWEN_API_KEY" \
  -d '{
    "model":"qwen3-coder-flash-1M",
    "messages":[{"role":"user","content":"你好"}]
  }'

如需直连测试，临时禁用代理：HTTPS_PROXY= HTTP_PROXY=。

llama-box 启动参考

对 Qwen3 的工具/推理友好：--jinja、--enable-reasoning 建议开启。

llama-box \
  --host 0.0.0.0 \
  --port 8080 \
  --model /path/to/Qwen3-Coder-…-GGUF.gguf \
  --chat-template chatml \
  --jinja \
  --enable-reasoning \
  --flash-attn \
  --cache-type-k q4_0 \
  --cache-type-v q4_0 \
  --ctx-size 262144 \
  --gpu-layers 49 \
  --threads 12 \
  --threads-batch 16 \
  --threads-http 16 \
  --batch-size 1024 \
  --ubatch-size 1024 \
  --defrag-thold -1 \
  --no-context-shift

贡献与许可证

欢迎提交 Issue/PR。许可证见仓库 LICENSE（如未提供，默认以仓库许可为准）。

13 KiB Raw Blame History Unescape Escape