agentic_huge_data_base / wiki
页面 Onyx · 7.6 连接器生命周期操作·DeepWiki 中文全文译文

7.6 · 连接器生命周期操作(Connector Lifecycle Operations)

企业连接器与统一搜索 · 本章是 Onyx DeepWiki 中文译文的独立章节页,保留原始链接、源码锚点、模块标签和章节层级。

项目Onyx 章节7.6 状态全文译文 模块测试、发布与运维、工作流与编排、认证、权限与安全、检索、召回与索引
源码线索
  • backend/alembic/versions/1c36b3dc2f4e_add_full_exception_trace_to_permission_.py
  • backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py
  • backend/ee/onyx/background/celery/tasks/external_group_syncing/tasks.py
  • backend/onyx/background/celery/tasks/connector_deletion/tasks.py
  • backend/onyx/background/celery/tasks/pruning/tasks.py
  • backend/onyx/connectors/google_utils/google_auth.py
  • backend/onyx/connectors/google_utils/shared_constants.py
  • backend/onyx/db/connector_credential_pair.py
  • backend/onyx/db/index_attempt.py
  • backend/onyx/db/indexing_coordination.py
模块标签
  • 测试、发布与运维
  • 工作流与编排
  • 认证、权限与安全
  • 检索、召回与索引
  • 图谱与关系

中文译文

连接器生命周期操作(中文译文)

原始 DeepWiki 页面:https://deepwiki.com/onyx-dot-app/onyx/7.6-connector-lifecycle-operations
翻译时间:2026-05-27T08:44:54.238Z
翻译模型:deepseek-chat
原文字符数:14149
项目:Onyx (onyx)

---

连接器生命周期操作

相关源文件

以下文件为本维基页面的生成提供了上下文:

  • backend/alembic/versions/1c36b3dc2f4e_add_full_exception_trace_to_permission_.py
  • backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py
  • backend/ee/onyx/background/celery/tasks/external_group_syncing/tasks.py
  • backend/onyx/background/celery/tasks/connector_deletion/tasks.py
  • backend/onyx/background/celery/tasks/pruning/tasks.py
  • backend/onyx/connectors/google_utils/google_auth.py
  • backend/onyx/connectors/google_utils/shared_constants.py
  • backend/onyx/db/connector_credential_pair.py
  • backend/onyx/db/index_attempt.py
  • backend/onyx/db/indexing_coordination.py
  • backend/onyx/db/permission_sync_attempt.py
  • backend/onyx/server/documents/cc_pair.py
  • backend/onyx/server/documents/connector.py
  • backend/onyx/server/documents/models.py
  • backend/tests/daily/connectors/gmail/conftest.py
  • backend/tests/daily/connectors/gmail/test_gmail_connector.py
  • backend/tests/external_dependency_unit/db/test_targeted_reindex_filter.py
  • backend/tests/external_dependency_unit/permission_sync/test_doc_permission_sync_attempt.py
  • backend/tests/external_dependency_unit/permission_sync/test_external_group_permission_sync_attempt.py
  • backend/tests/integration/common_utils/managers/cc_pair.py
  • backend/tests/integration/common_utils/managers/document_search.py
  • backend/tests/integration/connector_job_tests/google/test_google_drive_permission_sync.py
  • backend/tests/integration/connector_job_tests/slack/conftest.py
  • backend/tests/integration/connector_job_tests/slack/test_permission_sync.py
  • backend/tests/integration/connector_job_tests/slack/test_prune.py
  • backend/tests/integration/tests/indexing/test_checkpointing.py
  • backend/tests/integration/tests/indexing/test_repeated_error_state.py
  • backend/tests/integration/tests/pruning/test_pruning.py
  • backend/tests/regression/search_quality/README.md
  • backend/tests/regression/search_quality/run_search_eval.py
  • backend/tests/regression/search_quality/test_queries.json.template
  • backend/tests/unit/onyx/connectors/gmail/thread.json
  • [web/src/app/admin/connector/[ccPairId]/ConfigDisplay.tsx](web/src/app/admin/connector/[ccPairId]/ConfigDisplay.tsx)
  • [web/src/app/admin/connector/[ccPairId]/DocPermissionSyncAttemptsTable.tsx](web/src/app/admin/connector/[ccPairId]/DocPermissionSyncAttemptsTable.tsx)
  • [web/src/app/admin/connector/[ccPairId]/ExternalGroupSyncAttemptsTable.tsx](web/src/app/admin/connector/[ccPairId]/ExternalGroupSyncAttemptsTable.tsx)
  • [web/src/app/admin/connector/[ccPairId]/page.tsx](web/src/app/admin/connector/[ccPairId]/page.tsx)
  • [web/src/app/admin/connector/[ccPairId]/types.ts](web/src/app/admin/connector/[ccPairId]/types.ts)
  • web/src/app/admin/indexing/status/page.tsx
  • web/src/components/modals/EditPropertyModal.tsx

目的与范围

本文档描述了在 Onyx 中管理连接器-凭证对(CCPair)生命周期的后台操作。这些操作确保外部源与内部索引之间的数据一致性,处理数据的干净移除,并同步访问控制。关键操作包括:

  • 剪枝:从索引中删除外部源中已不存在的文档 backend/onyx/background/celery/tasks/pruning/tasks.py:164-165
  • 删除:移除连接器时,对所有数据(Vespa 片段、数据库元数据和关联关系)进行完整清理 backend/onyx/background/celery/tasks/connector_deletion/tasks.py:334-336
  • 权限同步:从外部源同步文档级权限(企业版) backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py:139-141
  • 外部组同步:同步外部用户组成员关系,实现动态访问控制 backend/ee/onyx/background/celery/tasks/external_group_syncing/tasks.py:116-117

---

协调架构

Onyx 使用基于 Redis 的分布式协调模式来管理这些长时间运行的任务。RedisConnector 类作为操作特定协调器的中心枢纽 backend/onyx/redis/redis_connector.py:15-30

RedisConnector 组成
classDiagram
    class RedisConnector {
        +str tenant_id
        +int cc_pair_id
        +RedisConnectorStop stop
        +RedisConnectorPrune prune
        +RedisConnectorDelete delete
        +RedisConnectorPermissionSync permissions
        +RedisConnectorExternalGroupSync external_group_sync
    }

    class RedisConnectorStop {
        +set_fence(bool)
        +bool timed_out
    }

    class RedisConnectorPrune {
        +set_fence(payload)
        +generate_tasks()
        +get_remaining()
    }

    class RedisConnectorDelete {
        +set_fence(payload)
        +generate_tasks()
        +get_remaining()
    }

    class RedisConnectorPermissionSync {
        +set_fence(payload)
        +update_db()
    }

    RedisConnector *-- RedisConnectorStop
    RedisConnector *-- RedisConnectorPrune
    RedisConnector *-- RedisConnectorDelete
    RedisConnector *-- RedisConnectorPermissionSync

来源backend/onyx/redis/redis_connector.py:15-86

协调原语
  • 栅栏(Fences):Redis 键,用于指示某个操作正在进行中。栅栏存储一个 JSON 载荷(例如 RedisConnectorPrunePayload),其中包含生成器任务的 celery_task_id backend/onyx/redis/redis_connector_prune.py:22-27
  • 任务集(Tasksets):Redis 集合,包含子任务的任务 ID。当任务集为空时,操作完成 backend/onyx/background/celery/tasks/connector_deletion/tasks.py:58-59
  • 活跃信号(Active Signals):短 TTL 键,用于指示操作正在积极进行,弥补任务生成与执行之间的间隙 backend/onyx/redis/redis_connector_prune.py:61-64
  • 阻塞信号(Block Signals):诸如 OnyxRedisSignals.BLOCK_VALIDATE_CONNECTOR_PRUNING_FENCES 之类的键,用于在生成器仍在排队工作时阻止看门狗清除栅栏 backend/onyx/background/celery/tasks/pruning/tasks.py:106-108

---

剪枝操作

剪枝通过删除已从源端移除的本地文档来同步 Onyx 索引。

调度与触发

剪枝由 check_for_pruning_task 触发 backend/onyx/background/celery/tasks/pruning/tasks.py:236。如果自上次成功剪枝或初始索引以来,CCPair 的 prune_freq 间隔已过,则需要进行剪枝 backend/onyx/background/celery/tasks/pruning/tasks.py:164-175

剪枝任务流程
sequenceDiagram
    participant Beat as "check_for_pruning_task<br/>(OnyxCeleryTask)"
    participant Redis as "Redis<br/>(协调)"
    participant Generator as "connector_pruning_generator_task"
    participant Subtask as "document_by_cc_pair_cleanup_task"
    participant DB as "PostgreSQL"

    Beat->>DB: _is_pruning_due(cc_pair)
    alt 到期
        Beat->>Redis: 设置剪枝栅栏 (RedisConnectorPrune)
        Beat->>Generator: 排队生成器任务
    end

    Generator->>Generator: instantiate_connector()
    Generator->>Generator: connector.load_from_state()
    Generator->>DB: get_documents_for_connector_credential_pair()
    Note over Generator: docs_to_prune = 本地文档 - 源端文档

    loop 对于每个要剪枝的文档
        Generator->>Redis: SADD 到任务集
        Generator->>Subtask: 排队清理任务
    end

    Subtask->>DB: delete_document_by_id()
    Subtask->>Redis: SREM 从任务集移除

来源backend/onyx/background/celery/tasks/pruning/tasks.py:392-450backend/onyx/background/celery/tasks/pruning/tasks.py:488-530

层级节点管理

在剪枝过程中,如果文档被移除,系统会管理 HierarchyNode 结构:

  1. 孤儿检查:通过 delete_orphaned_hierarchy_nodes 识别没有剩余文档链接的节点 backend/onyx/db/hierarchy.py:53
  2. 重新父化:孤儿节点可能会被重新父化,以确保树结构保持有效,通过 reparent_orphaned_hierarchy_nodes 实现 backend/onyx/db/hierarchy.py:55
  3. 清理:移除 hierarchy_node_cc_pair_relationship 中的过期条目 backend/onyx/db/hierarchy.py:54

---

删除操作

当通过 /manage/admin/deletion-attempt 端点选择删除 CCPair 时触发 backend/onyx/server/documents/cc_pair.py:150-159

任务依赖阻塞

删除是一种"破坏性"操作,如果索引或剪枝处于活动状态,则会阻塞。如果被阻塞,系统会调用 revoke_tasks_blocking_deletion 来取消该 CCPair 的活动 Celery 任务 backend/onyx/background/celery/tasks/connector_deletion/tasks.py:79-130

删除序列

一旦依赖项清除,check_for_connector_deletion_task 会执行以下操作:

  1. 撤销:撤销活动的索引、权限同步或剪枝任务 backend/onyx/background/celery/tasks/connector_deletion/tasks.py:82-119
  2. 文档清理:通过 get_document_ids_for_connector_credential_pair 识别该 CCPair 的所有文档,并排队清理子任务 backend/onyx/background/celery/tasks/connector_deletion/tasks.py:334-345
  3. 关系移除:通过 delete_document_set_cc_pair_relationship__no_commit 移除与 DocumentSet 的链接 backend/onyx/db/document_set.py:38
  4. 元数据删除:删除 IndexAttempt backend/onyx/db/index_attempt.py:44PermissionSyncAttempt 记录 backend/onyx/db/permission_sync_attempt.py:46-51
  5. 最终移除:删除 ConnectorCredentialPair backend/onyx/db/connector_credential_pair.py:30-31

来源backend/onyx/background/celery/tasks/connector_deletion/tasks.py:368-555

---

权限同步操作

将外部访问控制列表(ACL)同步到 Onyx 的数据库中。

实现细节

check_for_doc_permissions_sync 任务会识别 _is_external_doc_permissions_sync_due 返回 true 的 CCPair backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py:139-165

  • 初始同步:如果设置了 initial_index_should_sync,则首次同步会等待至少一次索引尝试完成 backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py:160-164
  • 外部组同步:一个单独的任务 check_for_external_group_sync 管理组成员关系的同步(例如 Slack 用户组) backend/ee/onyx/background/celery/tasks/external_group_syncing/tasks.py:167-180

来源backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py:139-242backend/ee/onyx/background/celery/tasks/external_group_syncing/tasks.py:167-180

---

操作监控

看门狗与栅栏验证

为防止孤立的 Redis 栅栏(例如由于工作进程崩溃),Onyx 会运行验证任务:

  • 剪枝验证validate_connector_pruning_fences 检查栅栏中的 celery_task_id 在消息代理中是否仍然活跃 backend/onyx/background/celery/tasks/pruning/tasks.py:598-610
  • 删除验证validate_connector_deletion_fences 对删除操作执行类似的检查 backend/onyx/background/celery/tasks/connector_deletion/tasks.py:580-595
任务优先级

操作使用特定的 Celery 队列,以防止"轻量"任务被"重量"任务阻塞:

  • 队列OnyxCeleryQueues.HEAVY 用于剪枝生成器,OnyxCeleryQueues.LIGHT 用于单个文档清理任务 backend/onyx/configs/constants.py:34-36
  • 超时:操作受 JOB_TIMEOUT 和特定锁超时(如 CELERY_PRUNING_LOCK_TIMEOUT)的约束 backend/onyx/configs/constants.py:30

来源backend/onyx/background/celery/tasks/pruning/tasks.py:598-640backend/onyx/background/celery/tasks/connector_deletion/tasks.py:580-620