6.1 · LLM 客户端架构（LLM Client Architecture）

时序知识图谱与动态事实记忆 · 本章是 Graphiti DeepWiki 中文译文的独立章节页，保留原始链接、源码锚点、模块标签和章节层级。

项目Graphiti 章节6.1 状态全文译文模块模型调用与提供方适配、接口与服务契约、界面与交互、入库与解析

项目要点页2.5 参考项目项目章节目录Graphiti DeepWiki 原始章节LLM Client Architecture 上一章6 下一章6.2

源码线索

examples/azure-openai/azure_openai_neo4j.py
examples/gliner2/.env.example
examples/gliner2/README.md
examples/gliner2/gliner2_neo4j.py
graphiti_core/cross_encoder/gemini_reranker_client.py
graphiti_core/embedder/azure_openai.py
graphiti_core/embedder/gemini.py
graphiti_core/llm_client/anthropic_client.py
graphiti_core/llm_client/azure_openai_client.py
graphiti_core/llm_client/client.py

模块标签

模型调用与提供方适配
接口与服务契约
界面与交互
入库与解析
系统架构

中文译文

LLM 客户端架构（中文译文）

原始 DeepWiki 页面：https://deepwiki.com/getzep/graphiti/6.1-llm-client-architecture

翻译时间：2026-05-27T08:45:05.510Z

翻译模型：deepseek-chat

原文字符数：16253

项目：Graphiti (graphiti)

---

大语言模型（LLM）客户端架构

类层次结构

Graphiti 中的所有大语言模型（LLM）客户端共享一个以 LLMClient 为根的继承树。

大语言模型（LLM）客户端的类层次结构

classDiagram
    class LLMClient {
        +LLMConfig config
        +str model
        +str small_model
        +float temperature
        +int max_tokens
        +bool cache_enabled
        +Tracer tracer
        +TokenUsageTracker token_tracker
        +set_tracer(tracer)
        +generate_response(messages, response_model, max_tokens, model_size, group_id, prompt_name)
        #_generate_response(messages, response_model, max_tokens, model_size)*
        #_clean_input(input)
        #_get_cache_key(messages)
        #_generate_response_with_retry(...)
    }

    class BaseOpenAIClient {
        +int MAX_RETRIES
        #_convert_messages_to_openai_format(messages)
        #_get_model_for_size(model_size)
        #_handle_structured_response(response)
        #_handle_json_response(response)
        #_create_completion()*
        #_create_structured_completion()*
    }

    class OpenAIClient {
        +AsyncOpenAI client
        #_create_structured_completion(...)
        #_create_completion(...)
    }

    class AzureOpenAILLMClient {
        +AsyncAzureOpenAI|AsyncOpenAI client
        +int MAX_RETRIES
        #_create_structured_completion(...)
        #_create_completion(...)
        #_handle_structured_response(response)
        #_supports_reasoning_features(model)
    }

    class OpenAIGenericClient {
        +AsyncOpenAI client
        +int MAX_RETRIES
        #_generate_response(...)
        +generate_response(...)
    }

    class AnthropicClient {
        +AsyncAnthropic client
        #_create_tool(response_model)
        #_extract_json_from_text(text)
        #_resolve_max_tokens(requested, model)
        #_generate_response(...)
    }

    class GeminiClient {
        +genai.Client client
        +int MAX_RETRIES
        #_check_safety_blocks(response)
        #_check_prompt_blocks(response)
        #_resolve_max_tokens(requested, model)
        #_generate_response(...)
    }

    class GroqClient {
        +AsyncGroq client
        #_generate_response(...)
    }

    class GLiNER2Client {
        +generate_response(...)
    }

    LLMClient <|-- BaseOpenAIClient
    LLMClient <|-- OpenAIGenericClient
    LLMClient <|-- AnthropicClient
    LLMClient <|-- GeminiClient
    LLMClient <|-- GroqClient
    LLMClient <|-- GLiNER2Client
    BaseOpenAIClient <|-- OpenAIClient
    BaseOpenAIClient <|-- AzureOpenAILLMClient

来源：graphiti_core/llm_client/client.py:71-147, graphiti_core/llm_client/openai_base_client.py:40-95, graphiti_core/llm_client/openai_client.py:27-125, graphiti_core/llm_client/azure_openai_client.py:31-167, graphiti_core/llm_client/openai_generic_client.py:37-214, graphiti_core/llm_client/anthropic_client.py:103-150, graphiti_core/llm_client/gemini_client.py:72-127, graphiti_core/llm_client/groq_client.py:48-85, graphiti_core/llm_client/gliner2_client.py:34-118

---

LLMConfig

LLMConfig 是传递给每个客户端构造函数的配置对象。它是一个普通的 Python 类（不是 Pydantic 模型）。

字段	类型	默认值	描述
`api_key`	`str \	None`	`None`	提供商 API 密钥
`model`	`str \	None`	`None`	主模型标识符
`small_model`	`str \	None`	`None`	用于简单提示的较小/较便宜模型
`base_url`	`str \	None`	`None`	覆盖 API 基础 URL（例如，用于本地端点）
`temperature`	`float`	`1.0`	采样温度 `graphiti_core/llm_client/config.py:20`
`max_tokens`	`int`	`16384`	最大输出 Token 数 `graphiti_core/llm_client/config.py:19`

ModelSize 是一个枚举，包含两个值：small 和 medium graphiti_core/llm_client/config.py:23-25。所有对 generate_response 的调用都接受一个 model_size 参数；客户端会将 ModelSize.small 路由到 small_model，将 ModelSize.medium 路由到 model。

来源：graphiti_core/llm_client/config.py:19-69

---

LLMClient 抽象基类

graphiti_core/llm_client/client.py:71-147 中的 LLMClient 是所有提供商实现的抽象基类。

构造函数

LLMClient(config: LLMConfig | None, cache: bool = False)

如果 config 为 None，则会使用默认的 LLMConfig() graphiti_core/llm_client/client.py:73-74。当 cache=True 时，会创建一个指向 ./llm_cache 的 LLMCache 实例 graphiti_core/llm_client/client.py:35, graphiti_core/llm_client/client.py:87-88。

`generate_response` — 公共接口

这是调用者的唯一公共入口点。其签名如下：

async generate_response(
    messages: list[Message],
    response_model: type[BaseModel] | None = None,
    max_tokens: int | None = None,
    model_size: ModelSize = ModelSize.medium,
    group_id: str | None = None,
    prompt_name: str | None = None,
) -> dict[str, Any]

基类实现按顺序执行以下步骤 graphiti_core/llm_client/client.py:155-247：

如果提供了 response_model，则将其 JSON 模式追加到最后一条消息中 graphiti_core/llm_client/client.py:167-173。
将多语言提取指令（来自 get_extraction_language_instruction(group_id)）追加到第一条消息中 graphiti_core/llm_client/client.py:176。
对每条消息调用 _clean_input，以去除无效的 Unicode 和控制字符 graphiti_core/llm_client/client.py:178-179。
打开一个追踪跨度（llm.generate）并设置属性，包括 llm.provider、model.size、max_tokens、cache.enabled，以及可选的 prompt.name graphiti_core/llm_client/client.py:182-191。
检查缓存；如果命中，则立即返回 graphiti_core/llm_client/client.py:194-197。
调用 _generate_response_with_retry，该方法使用 Tenacity 重试逻辑包装了抽象的 _generate_response graphiti_core/llm_client/client.py:202-212。
如果启用了缓存，则将结果存储到缓存中 graphiti_core/llm_client/client.py:214-216。

抽象方法：`_generate_response`

@abstractmethod
async def _generate_response(
    self,
    messages: list[Message],
    response_model: type[BaseModel] | None = None,
    max_tokens: int = DEFAULT_MAX_TOKENS,
    model_size: ModelSize = ModelSize.medium,
) -> dict[str, typing.Any]:
    pass

来源：graphiti_core/llm_client/client.py:139-147

---

具体实现

具体客户端类比较表

类	上游 SDK	默认主模型	结构化输出方法
`OpenAIClient`	`openai`	`gpt-4.1-mini`	`responses.parse`（推理）/ `chat.completions`（标准）
`AzureOpenAILLMClient`	`openai`（Azure）	_（由调用者设置）_	`responses.parse`（o1/o3/gpt-5）/ `beta.chat.completions.parse`（标准）
`OpenAIGenericClient`	`openai`	`gpt-4.1-mini`	`json_schema` 响应格式
`AnthropicClient`	`anthropic`	`claude-haiku-4-5-latest`	工具使用（`_create_tool`）
`GeminiClient`	`google-genai`	`gemini-3-flash-preview`	`response_mime_type=application/json`
`GroqClient`	`groq`	`llama-3.1-70b-versatile`	`json_object` 响应格式
`GLiNER2Client`	`gliner`	`gliner_medium-v2.1`	本地模型推理

来源：graphiti_core/llm_client/openai_client.py:27-125, graphiti_core/llm_client/azure_openai_client.py:31-167, graphiti_core/llm_client/openai_generic_client.py:37-214, graphiti_core/llm_client/anthropic_client.py:103-150, graphiti_core/llm_client/gemini_client.py:72-127, graphiti_core/llm_client/groq_client.py:48-85, graphiti_core/llm_client/gliner2_client.py:34-118

OpenAI 系列（`BaseOpenAIClient`、`OpenAIClient`、`AzureOpenAILLMClient`）

BaseOpenAIClient 持有 OpenAI 兼容 API 的共享逻辑 graphiti_core/llm_client/openai_base_client.py:40-58。它定义了两个抽象钩子：_create_structured_completion 和 _create_completion。

OpenAIClient 通过前缀（gpt-5、o1、o3）检测推理模型 graphiti_core/llm_client/openai_client.py:77-79。对于这些模型，它会调用 client.responses.parse graphiti_core/llm_client/openai_client.py:99；对于标准模型，它会调用 client.chat.completions.create，并设置 response_format={'type': 'json_object'} graphiti_core/llm_client/openai_client.py:119-125。

AzureOpenAILLMClient 根据 _supports_reasoning_features(model) 将请求路由到 responses.parse 或 beta.chat.completions.parse graphiti_core/llm_client/azure_openai_client.py:74-104。

`OpenAIGenericClient`

专为本地模型（Ollama、LM Studio）设计。它使用 json_schema 响应格式 graphiti_core/llm_client/openai_generic_client.py:115-121。默认 max_tokens 为 16,384，以确保兼容性 graphiti_core/llm_client/openai_generic_client.py:75-76。

`AnthropicClient`

使用工具使用 API 进行结构化输出。_create_tool 从 response_model 生成工具定义 graphiti_core/llm_client/anthropic_client.py:177-220。它通过 ANTHROPIC_MODEL_MAX_TOKENS 处理模型特定的 Token 限制 graphiti_core/llm_client/anthropic_client.py:75-97。

`GeminiClient`

与 google-genai 集成。它通过 _check_safety_blocks 处理安全过滤器 graphiti_core/llm_client/gemini_client.py:128-152，并通过 _check_prompt_blocks 处理提示拦截 graphiti_core/llm_client/gemini_client.py:154-162。它支持 Gemini 2.5+ 模型的 thinking_config graphiti_core/llm_client/gemini_client.py:109-110。

---

横切行为

通过 generate_response 的调用流程

sequenceDiagram
    participant "调用者" as caller
    participant "LLMClient.generate_response" as gr
    participant "LLMCache" as cache
    participant "追踪器" as tracer
    participant "_generate_response_with_retry" as retry
    participant "提供商 API" as api

    caller->>gr: "generate_response(messages, response_model, ...)"
    gr->>gr: "将 JSON 模式追加到最后一条消息（如果提供了 response_model）"
    gr->>gr: "将 get_extraction_language_instruction() 追加到 messages[0]"
    gr->>gr: "对每条消息调用 _clean_input()"
    gr->>tracer: "start_span('llm.generate')"
    gr->>cache: "get(cache_key)"
    alt "缓存命中"
        cache-->>gr: "缓存的字典"
        gr-->>caller: "缓存的字典"
    else "缓存未命中"
        gr->>retry: "_generate_response_with_retry(messages, ...)"
        retry->>api: "_generate_response()"
        api-->>retry: "响应字典"
        retry-->>gr: "响应字典"
        gr->>cache: "set(cache_key, response)"
        gr-->>caller: "响应字典"
    end
    gr->>tracer: "结束跨度"

来源：graphiti_core/llm_client/client.py:155-247

重试逻辑

客户端使用 Tenacity 进行自动重试。is_server_or_retry_error 决定某个异常（如 RateLimitError 或 5xx 状态码）是否需要进行重试 graphiti_core/llm_client/client.py:62-69。

客户端	策略	尝试次数
`LLMClient`	指数退避（5-120 秒）	4 `graphiti_core/llm_client/client.py:117-118`
`BaseOpenAIClient`	类常量	2 `graphiti_core/llm_client/openai_base_client.py:49`
`AnthropicClient`	SDK 内部	1 `graphiti_core/llm_client/anthropic_client.py:146`
`GeminiClient`	类常量	2 `graphiti_core/llm_client/gemini_client.py:93`

来源：graphiti_core/llm_client/client.py:116-126, graphiti_core/llm_client/openai_base_client.py:49, graphiti_core/llm_client/anthropic_client.py:146, graphiti_core/llm_client/gemini_client.py:93

Token 追踪

TokenUsageTracker graphiti_core/llm_client/token_tracker.py 记录每个提示的使用情况。具体客户端在收到 API 响应后会记录使用情况，以追踪输入和输出 Token graphiti_core/llm_client/openai_base_client.py:127-130, graphiti_core/llm_client/anthropic_client.py:417-422。

响应缓存

LLMCache graphiti_core/llm_client/cache.py 将响应存储在 ./llm_cache 中 graphiti_core/llm_client/client.py:35。缓存键是模型和消息的 MD5 哈希值 graphiti_core/llm_client/client.py:149-153。

---

提供商到代码的映射

每个提供商的文件和类位置

graph TB
    subgraph "graphiti_core/llm_client/"
        A["client.py\nLLMClient (抽象基类)"]
        B["config.py\nLLMConfig, ModelSize"]
        C["openai_base_client.py\nBaseOpenAIClient"]
        D["openai_client.py\nOpenAIClient"]
        E["azure_openai_client.py\nAzureOpenAILLMClient"]
        F["openai_generic_client.py\nOpenAIGenericClient"]
        G["anthropic_client.py\nAnthropicClient"]
        H["gemini_client.py\nGeminiClient"]
        I["groq_client.py\nGroqClient"]
        J["gliner2_client.py\nGLiNER2Client"]
        K["token_tracker.py\nTokenUsageTracker"]
    end

    A --> B
    A --> K
    C --> A
    D --> C
    E --> C
    F --> A
    G --> A
    H --> A
    I --> A
    J --> A

来源：graphiti_core/llm_client/client.py:1-147, graphiti_core/llm_client/openai_base_client.py:1-38, graphiti_core/llm_client/anthropic_client.py:1-44, graphiti_core/llm_client/gemini_client.py:1-43, graphiti_core/llm_client/groq_client.py:1-34, graphiti_core/llm_client/gliner2_client.py:1-32