agentic_huge_data_base / wiki
页面 Graphiti · 10.1 Community Detection 与 Clustering·DeepWiki 中文全文译文

10.1 · Community Detection 与 Clustering(Community Detection and Clustering)

时序知识图谱与动态事实记忆 · 本章是 Graphiti DeepWiki 中文译文的独立章节页,保留原始链接、源码锚点、模块标签和章节层级。

项目Graphiti 章节10.1 状态全文译文 模块图谱与关系、模型调用与提供方适配、测试、发布与运维、界面与交互
源码线索
  • graphiti_core/edges.py
  • graphiti_core/graphiti.py
  • graphiti_core/nodes.py
  • graphiti_core/utils/maintenance/community_operations.py
  • graphiti_core/utils/maintenance/graph_data_operations.py
模块标签
  • 图谱与关系
  • 模型调用与提供方适配
  • 测试、发布与运维
  • 界面与交互
  • 工作流与编排

中文译文

Community Detection 与 Clustering(中文译文)

原始 DeepWiki 页面:https://deepwiki.com/getzep/graphiti/10.1-community-detection-and-clustering
翻译时间:2026-05-27T08:44:58.759Z
翻译模型:deepseek-chat
原文字符数:13711
项目:Graphiti (graphiti)

---

社区检测与聚类

相关源文件

以下文件为本维基页面的生成提供了上下文:

  • graphiti_core/edges.py
  • graphiti_core/graphiti.py
  • graphiti_core/nodes.py
  • graphiti_core/utils/maintenance/community_operations.py
  • graphiti_core/utils/maintenance/graph_data_operations.py

目的与范围

本文档描述了 Graphiti 的社区检测与聚类系统,该系统将知识图谱中的相关实体分组,并创建层次化摘要。该系统使用标签传播算法来识别密集连接的实体簇,然后通过基于大语言模型(LLM)的层次化摘要生成 CommunityNode 实体,并为其提供描述性名称和合成摘要。

关于从片段中提取实体和关系的信息,请参阅实体与关系提取。关于社区搜索与检索的详细信息,请参阅搜索与检索系统

---

数据模型

社区在图中作为一等节点表示,通过 HAS_MEMBER 边聚合相关实体。

社区层到代码实体空间的映射
graph TB
    subgraph "自然语言空间"
        C_DESC["'AI 研究团队'"]
        E_DESC["'Alice(研究员)'"]
    end

    subgraph "代码实体空间 (graphiti_core/nodes.py)"
        CN["CommunityNode"]
        EN["EntityNode"]
    end

    subgraph "代码实体空间 (graphiti_core/edges.py)"
        CE["CommunityEdge (标签: HAS_MEMBER)"]
        EE["EntityEdge (标签: RELATES_TO)"]
    end

    CN -- "映射到" --> C_DESC
    EN -- "映射到" --> E_DESC

    CN -->|"source_node_uuid"| CE
    CE -->|"target_node_uuid"| EN

    EN -->|"source_node_uuid"| EE
    EE -->|"target_node_uuid"| EN

来源: graphiti_core/nodes.py:666-713, graphiti_core/edges.py:567-678

CommunityNode
字段类型描述
uuidstr唯一标识符 graphiti_core/nodes.py:94-94
namestr由大语言模型(LLM)生成的描述性社区名称 graphiti_core/nodes.py:668-668
group_idstr图分区标识符 graphiti_core/nodes.py:96-96
summarystr成员实体摘要的层次化摘要 graphiti_core/nodes.py:670-670
name_embeddinglist[float]社区名称的向量嵌入 graphiti_core/nodes.py:672-672
created_atdatetime创建时间戳 graphiti_core/nodes.py:98-98

来源: graphiti_core/nodes.py:93-101, graphiti_core/nodes.py:666-713

CommunityEdge

CommunityEdge 表示社区与实体之间的 HAS_MEMBER 关系。

字段类型描述
uuidstr唯一标识符 graphiti_core/edges.py:50-50
source_node_uuidstrCommunityNode UUID graphiti_core/edges.py:52-52
target_node_uuidstrEntityNode UUID(成员) graphiti_core/edges.py:53-53
group_idstr图分区标识符 graphiti_core/edges.py:51-51
created_atdatetime创建时间戳 graphiti_core/edges.py:54-54

来源: graphiti_core/edges.py:49-54, graphiti_core/edges.py:567-678

---

标签传播算法

Graphiti 使用标签传播进行社区检测,这是一种迭代算法,节点根据边权重采用其邻居的社区标签。

算法概述
graph LR
    Init["初始化:每个节点自成一个社区"]
    Iterate["对于每个节点"]
    Count["统计邻居社区,按 edge_count 加权"]
    Update["采用多数邻居社区"]
    Check{"是否有变化?"}
    Return["返回社区簇"]

    Init --> Iterate
    Iterate --> Count
    Count --> Update
    Update --> Check
    Check -->|是| Iterate
    Check -->|否| Return

来源: graphiti_core/utils/maintenance/community_operations.py:93-138

实现细节

label_propagation() 函数在图投影上运行:

  • 输入: dict[str, list[Neighbor]] - 将节点 UUID 映射到邻居列表,并附带边计数。Neighbor 模型跟踪 node_uuidedge_count graphiti_core/utils/maintenance/community_operations.py:25-27
  • 输出: list[list[str]] - 簇列表,每个簇是一个节点 UUID 列表 graphiti_core/utils/maintenance/community_operations.py:137-138

算法步骤:

  1. 初始化: 每个节点 UUID 根据其在投影键中的位置被分配一个唯一的社区 ID graphiti_core/utils/maintenance/community_operations.py:100-100
  2. 传播循环: 迭代直到收敛(no_change 保持为 Truegraphiti_core/utils/maintenance/community_operations.py:102-129
  3. - 对于每个节点,统计按社区标签分组的邻居数量,并按 neighbor.edge_count 加权 graphiti_core/utils/maintenance/community_operations.py:110-111。 - 候选社区按排名(计数)排序 graphiti_core/utils/maintenance/community_operations.py:112-116。 - 采用多数社区(通过排序打破平局)graphiti_core/utils/maintenance/community_operations.py:116-117。 - 要求 candidate_rank > 1(多于一个连接)才能更改社区,否则保持当前社区或最大社区 graphiti_core/utils/maintenance/community_operations.py:118-121

来源: graphiti_core/utils/maintenance/community_operations.py:93-138

---

社区检测工作流

完整的社区检测过程包括构建图投影、运行标签传播以及通过层次化摘要创建社区节点。

sequenceDiagram
    participant API as build_communities() (graphiti.py)
    participant Detect as get_community_clusters() (community_operations.py)
    participant DB as GraphDriver
    participant LP as label_propagation()
    participant Builder as build_community()
    participant LLM as LLMClient

    API->>Detect: group_ids

    loop 对于每个 group_id
        Detect->>DB: 获取所有 EntityNode
        Detect->>DB: 按 group_id 获取所有 EntityNode [graphiti_core/utils/maintenance/community_operations.py:55-56]()

        loop 对于每个 EntityNode
            Detect->>DB: 查询 RELATES_TO 边并附带计数 [graphiti_core/utils/maintenance/community_operations.py:57-73]()
            Note over Detect: 构建图投影
        end

        Detect->>LP: 投影字典 [graphiti_core/utils/maintenance/community_operations.py:80-80]()
        LP-->>Detect: 簇 UUID [graphiti_core/utils/maintenance/community_operations.py:80-80]()

        Detect->>DB: 按 UUID 获取 EntityNode [graphiti_core/utils/maintenance/community_operations.py:82-87]()
    end

    Detect-->>API: 实体簇列表 [graphiti_core/utils/maintenance/community_operations.py:89-89]()

    loop 对于每个簇(并发)
        API->>Builder: 簇实体 [graphiti_core/utils/maintenance/community_operations.py:224-234]()
        Builder->>Builder: 层次化摘要 [graphiti_core/utils/maintenance/community_operations.py:177-197]()
        Builder->>LLM: summarize_pair() [graphiti_core/utils/maintenance/community_operations.py:185-193]()
        LLM-->>Builder: Summary 对象
        Builder->>LLM: generate_summary_description() [graphiti_core/utils/maintenance/community_operations.py:200-200]()
        LLM-->>Builder: SummaryDescription 对象
        Builder-->>API: CommunityNode + CommunityEdges [graphiti_core/utils/maintenance/community_operations.py:202-213]()
    end

    API->>DB: 保存社区节点和边 [graphiti_core/graphiti.py:240-240]()

来源: graphiti_core/utils/maintenance/community_operations.py:30-90, graphiti_core/utils/maintenance/community_operations.py:218-243, graphiti_core/graphiti.py:240-240

图投影构建

投影通过查询图中每个实体的邻居来构建。对于标准提供者(如 Neo4j),它使用直接的 RELATES_TO 匹配 graphiti_core/utils/maintenance/community_operations.py:57-59

对于 Kuzu 提供者(其中边也是节点):

MATCH (n:Entity {group_id: $group_id, uuid: $uuid})-[:RELATES_TO]-(e:RelatesToNode_)-[:RELATES_TO]-(m: Entity {group_id: $group_id})
WITH count(e) AS count, m.uuid AS uuid
RETURN uuid, count

graphiti_core/utils/maintenance/community_operations.py:61-63

来源: graphiti_core/utils/maintenance/community_operations.py:30-90

---

层次化摘要

一旦识别出簇,Graphiti 会使用层次化成对大语言模型(LLM)摘要为每个社区生成一个统一的摘要,以处理大量成员实体。

成对合并策略
graph TB
    E1["实体 1 摘要"]
    E2["实体 2 摘要"]
    E3["实体 3 摘要"]
    E4["实体 4 摘要"]
    E5["实体 5 摘要"]

    M1["LLM: summarize_pair"]
    M2["LLM: summarize_pair"]
    M3["LLM: summarize_pair (与 E5)"]

    Final["最终社区摘要"]

    E1 --> M1
    E2 --> M1
    E3 --> M2
    E4 --> M2
    E5 --> M3

    M1 --> M3
    M2 --> M3

    M3 --> Final

来源: graphiti_core/utils/maintenance/community_operations.py:174-215

摘要过程

build_community() 函数实现了层次化摘要:

  1. 收集摘要: 从簇中所有 EntityNode 对象提取 summary 字段 graphiti_core/utils/maintenance/community_operations.py:177-177
  2. 成对合并:length > 1 时:
  3. - 配对相邻摘要。处理奇数数量时,弹出最后一个摘要并将其添加回下一层 graphiti_core/utils/maintenance/community_operations.py:181-182。 - 使用 semaphore_gather() 并发调用每对的 summarize_pair() graphiti_core/utils/maintenance/community_operations.py:185-193

  4. 名称生成: 调用 generate_summary_description() 从最终合并的摘要中创建一个描述性社区名称 graphiti_core/utils/maintenance/community_operations.py:200-200
  5. 创建对象: 使用生成的名称和摘要实例化 CommunityNode,然后为所有成员创建 CommunityEdge 对象 graphiti_core/utils/maintenance/community_operations.py:202-213

来源: graphiti_core/utils/maintenance/community_operations.py:174-215

大语言模型(LLM)提示

两个专门的提示驱动了 graphiti_core/prompts/summarize_nodes.py 中的摘要过程:

提示函数目的响应模型
summarize_pair将两个摘要的信息合成为一个简洁的摘要(少于 250 个字符) graphiti_core/utils/maintenance/community_operations.py:141-154Summary graphiti_core/utils/maintenance/community_operations.py:149-149
summary_description创建一个简短的、一句话的描述,说明摘要了哪种信息 graphiti_core/utils/maintenance/community_operations.py:158-171SummaryDescription graphiti_core/utils/maintenance/community_operations.py:166-166

来源: graphiti_core/utils/maintenance/community_operations.py:141-171

---

增量社区更新

当新实体添加到图中时,update_community() 函数会增量更新现有社区,而不是重建整个结构。

社区分配逻辑

determine_entity_community() 函数决定实体属于哪个社区:

  1. 检查现有成员关系: 查询实体是否已有 HAS_MEMBER 关系 graphiti_core/utils/maintenance/community_operations.py:266-275
  2. 查找邻居社区: 如果没有现有成员关系,则通过 RELATES_TO 边查询连接实体的社区 graphiti_core/utils/maintenance/community_operations.py:285-311
  3. 模式选择: 选择在邻居中出现频率最高的社区 graphiti_core/utils/maintenance/community_operations.py:314-318

来源: graphiti_core/utils/maintenance/community_operations.py:261-324

---

并发控制

社区构建限制为 MAX_COMMUNITY_BUILD_CONCURRENCY = 10 个并发大语言模型(LLM)调用,以防止压垮提供者 graphiti_core/utils/maintenance/community_operations.py:20-20

# build_communities 工具中信号量的内部使用
semaphore = asyncio.Semaphore(MAX_COMMUNITY_BUILD_CONCURRENCY)

async def limited_build_community(cluster):
    async with semaphore:
        return await build_community(llm_client, cluster)

graphiti_core/utils/maintenance/community_operations.py:224-234

---

性能特征

操作复杂度说明
标签传播O(E × I)E = 边数,I = 直到收敛的迭代次数 graphiti_core/utils/maintenance/community_operations.py:93-138
层次化摘要O(N × log N)N = 簇大小;需要 log N 轮并发大语言模型(LLM)调用 graphiti_core/utils/maintenance/community_operations.py:174-215
社区更新O(1)单次大语言模型(LLM)调用以合并摘要 graphiti_core/utils/maintenance/community_operations.py:327-354

来源: graphiti_core/utils/maintenance/community_operations.py:93-215, graphiti_core/utils/maintenance/community_operations.py:327-354