支持数据 Sources(中文译文)
原始 DeepWiki 页面:https://deepwiki.com/onyx-dot-app/onyx/3.4-supported-data-sources
翻译时间:2026-05-27T08:45:25.392Z
翻译模型:deepseek-chat
原文字符数:31238
项目:Onyx (onyx)
---
支持的数据源
相关源文件
以下文件被用作生成此 Wiki 页面的上下文:
backend/alembic/versions/3fc5d75723b3_add_doc_metadata_field_in_document_model.pybackend/alembic/versions/47a07e1a38f1_fix_invalid_model_configurations_state.pybackend/alembic/versions/7a70b7664e37_add_model_configuration_table.pybackend/alembic/versions/9a0296d7421e_add_is_auto_mode_to_llm_provider.pybackend/ee/onyx/connectors/perm_sync_valid.pybackend/ee/onyx/external_permissions/confluence/constants.pybackend/ee/onyx/external_permissions/confluence/doc_sync.pybackend/ee/onyx/external_permissions/confluence/group_sync.pybackend/ee/onyx/external_permissions/confluence/space_access.pybackend/ee/onyx/external_permissions/github/utils.pybackend/ee/onyx/external_permissions/gmail/doc_sync.pybackend/ee/onyx/external_permissions/google_drive/doc_sync.pybackend/ee/onyx/external_permissions/google_drive/permission_retrieval.pybackend/ee/onyx/external_permissions/jira/doc_sync.pybackend/ee/onyx/external_permissions/salesforce/postprocessing.pybackend/ee/onyx/external_permissions/salesforce/utils.pybackend/ee/onyx/external_permissions/sharepoint/doc_sync.pybackend/ee/onyx/external_permissions/sharepoint/group_sync.pybackend/ee/onyx/external_permissions/sharepoint/permission_utils.pybackend/ee/onyx/external_permissions/slack/doc_sync.pybackend/ee/onyx/external_permissions/slack/group_sync.pybackend/ee/onyx/external_permissions/slack/utils.pybackend/ee/onyx/external_permissions/sync_params.pybackend/ee/onyx/external_permissions/teams/doc_sync.pybackend/ee/onyx/external_permissions/utils.pybackend/onyx/access/models.pybackend/onyx/background/indexing/checkpointing_utils.pybackend/onyx/connectors/airtable/airtable_connector.pybackend/onyx/connectors/axero/connector.pybackend/onyx/connectors/bookstack/client.pybackend/onyx/connectors/clickup/connector.pybackend/onyx/connectors/confluence/connector.pybackend/onyx/connectors/confluence/onyx_confluence.pybackend/onyx/connectors/confluence/utils.pybackend/onyx/connectors/connector_runner.pybackend/onyx/connectors/discord/__init__.pybackend/onyx/connectors/discord/connector.pybackend/onyx/connectors/discourse/connector.pybackend/onyx/connectors/document360/connector.pybackend/onyx/connectors/egnyte/connector.pybackend/onyx/connectors/fireflies/connector.pybackend/onyx/connectors/gitbook/__init__.pybackend/onyx/connectors/gitbook/connector.pybackend/onyx/connectors/google_drive/connector.pybackend/onyx/connectors/google_drive/doc_conversion.pybackend/onyx/connectors/google_drive/file_retrieval.pybackend/onyx/connectors/google_drive/models.pybackend/onyx/connectors/google_utils/resources.pybackend/onyx/connectors/highspot/__init__.pybackend/onyx/connectors/highspot/client.pybackend/onyx/connectors/highspot/connector.pybackend/onyx/connectors/highspot/utils.pybackend/onyx/connectors/hubspot/connector.pybackend/onyx/connectors/hubspot/rate_limit.pybackend/onyx/connectors/interfaces.pybackend/onyx/connectors/linear/connector.pybackend/onyx/connectors/mock_connector/connector.pybackend/onyx/connectors/productboard/connector.pybackend/onyx/connectors/salesforce/connector.pybackend/onyx/connectors/salesforce/doc_conversion.pybackend/onyx/connectors/salesforce/onyx_salesforce.pybackend/onyx/connectors/salesforce/salesforce_calls.pybackend/onyx/connectors/salesforce/sqlite_functions.pybackend/onyx/connectors/salesforce/utils.pybackend/onyx/connectors/sharepoint/connector.pybackend/onyx/connectors/sharepoint/connector_utils.pybackend/onyx/connectors/slack/connector.pybackend/onyx/connectors/slack/onyx_retry_handler.pybackend/onyx/connectors/slack/onyx_slack_web_client.pybackend/onyx/connectors/slack/utils.pybackend/onyx/connectors/teams/connector.pybackend/onyx/connectors/teams/models.pybackend/onyx/connectors/teams/utils.pybackend/onyx/connectors/zendesk/connector.pybackend/onyx/onyxbot/slack/icons.pybackend/onyx/server/documents/standard_oauth.pybackend/onyx/tools/tool_implementations/mcp/mcp_client.pybackend/onyx/utils/subclasses.pybackend/onyx/utils/threadpool_concurrency.pybackend/scripts/decrypt.pybackend/tests/daily/connectors/airtable/test_airtable_basic.pybackend/tests/daily/connectors/discord/test_discord_connector.pybackend/tests/daily/connectors/fireflies/test_fireflies_connector.pybackend/tests/daily/connectors/fireflies/test_fireflies_data.jsonbackend/tests/daily/connectors/gitbook/test_gitbook_connector.pybackend/tests/daily/connectors/google_drive/conftest.pybackend/tests/daily/connectors/google_drive/consts_and_utils.pybackend/tests/daily/connectors/google_drive/test_admin_oauth.pybackend/tests/daily/connectors/google_drive/test_drive_perm_sync.pybackend/tests/daily/connectors/google_drive/test_link_visibility_filter.pybackend/tests/daily/connectors/google_drive/test_map_test_ids.pybackend/tests/daily/connectors/google_drive/test_resolver.pybackend/tests/daily/connectors/google_drive/test_sections.pybackend/tests/daily/connectors/google_drive/test_service_acct.pybackend/tests/daily/connectors/google_drive/test_user_1_oauth.pybackend/tests/daily/connectors/highspot/test_highspot_connector.pybackend/tests/daily/connectors/highspot/test_highspot_data.jsonbackend/tests/daily/connectors/hubspot/test_hubspot_connector.pybackend/tests/daily/connectors/salesforce/test_salesforce_connector.pybackend/tests/daily/connectors/salesforce/test_salesforce_data.jsonbackend/tests/daily/connectors/sharepoint/test_sharepoint_connector.pybackend/tests/daily/connectors/slack/test_slack_connector.pybackend/tests/daily/connectors/slack/test_slack_perm_sync.pybackend/tests/daily/connectors/teams/test_teams_connector.pybackend/tests/daily/connectors/utils.pybackend/tests/daily/connectors/zendesk/test_zendesk_connector.pybackend/tests/daily/connectors/zendesk/test_zendesk_data.jsonbackend/tests/external_dependency_unit/connectors/confluence/conftest.pybackend/tests/integration/connector_job_tests/sharepoint/conftest.pybackend/tests/integration/connector_job_tests/slack/slack_api_utils.pybackend/tests/unit/ee/onyx/external_permissions/confluence/test_space_access.pybackend/tests/unit/ee/onyx/external_permissions/salesforce/test_postprocessing.pybackend/tests/unit/ee/onyx/external_permissions/sharepoint/test_permission_utils.pybackend/tests/unit/onyx/connectors/airtable/test_airtable_index_all.pybackend/tests/unit/onyx/connectors/confluence/test_confluence_checkpointing.pybackend/tests/unit/onyx/connectors/confluence/test_onyx_confluence.pybackend/tests/unit/onyx/connectors/discord/test_discord_validation.pybackend/tests/unit/onyx/connectors/google_drive/__init__.pybackend/tests/unit/onyx/connectors/google_drive/test_slim_retrieval.pybackend/tests/unit/onyx/connectors/google_utils/test_impersonation_guard.pybackend/tests/unit/onyx/connectors/hubspot/test_hubspot_inline_associations.pybackend/tests/unit/onyx/connectors/jira/test_jira_permission_sync.pybackend/tests/unit/onyx/connectors/linear/test_linear_load_credentials.pybackend/tests/unit/onyx/connectors/salesforce/test_salesforce_custom_config.pybackend/tests/unit/onyx/connectors/salesforce/test_salesforce_sqlite.pybackend/tests/unit/onyx/connectors/salesforce/test_yield_doc_batches.pybackend/tests/unit/onyx/connectors/sharepoint/test_delta_checkpointing.pybackend/tests/unit/onyx/connectors/sharepoint/test_drive_matching.pybackend/tests/unit/onyx/connectors/sharepoint/test_fetch_site_pages.pybackend/tests/unit/onyx/connectors/sharepoint/test_hierarchy_helpers.pybackend/tests/unit/onyx/connectors/sharepoint/test_rest_client_context_caching.pybackend/tests/unit/onyx/connectors/teams/test_collect_teams.pybackend/tests/unit/onyx/connectors/test_connector_factory.pybackend/tests/unit/onyx/connectors/utils.pybackend/tests/unit/onyx/connectors/zendesk/test_zendesk_checkpointing.pybackend/tests/unit/onyx/connectors/zendesk/test_zendesk_rate_limit.pyweb/src/app/craft/components/ConnectDataBanner.tsxweb/src/app/craft/components/ConnectorBannersRow.tsxweb/src/app/craft/v1/configure/components/ComingSoonConnectors.tsxweb/src/lib/connectors/AutoSyncOptionFields.tsx
目的与范围
本文档列出了 Onyx 可以连接的所有数据源,用于文档索引和检索。它记录了数据源的枚举、元数据、配置要求、认证方式以及后端实现细节。有关连接器框架和生命周期的信息,请参阅连接器框架概述。有关凭证管理的详细信息,请参阅凭证管理。有关配置这些连接器的管理界面,请参阅连接器管理界面。
---
数据源枚举
所有支持的数据源都在 ValidSources 枚举中定义。该枚举是整个系统中连接器类型的唯一真实来源。
源文件: web/src/lib/types.ts:466-526
ValidSources 枚举包含 60 多个数据源,分为以下几类:
- 知识库和 Wiki(Confluence、Notion、BookStack 等)
- 云存储(Google Drive、Dropbox、S3 等)
- 工单和任务管理(Jira、Zendesk、Linear 等)
- 消息平台(Slack、Teams、Gmail 等)
- 代码仓库(GitHub、GitLab、Bitbucket)
- 销售平台(Salesforce、HubSpot、Gong)
- 通用数据源(Web、File、Ingestion API)
- 特殊数据源(FederatedSlack、CraftFile、UserFile)
数据源注册流程
下图展示了从界面的数据源字符串如何映射到后端的连接器实现类。
数据源映射逻辑
graph TB
subgraph "前端类型系统"
ValidSources["ValidSources 枚举 (web/src/lib/types.ts)"]
ConfigurableSources["ConfigurableSources 类型 (web/src/lib/types.ts)"]
end
subgraph "元数据层"
SOURCE_METADATA_MAP["SOURCE_METADATA_MAP (web/src/lib/sources.ts)"]
PartialSourceMetadata["PartialSourceMetadata 图标、显示名称、分类、文档"]
end
subgraph "配置层"
connectorConfigs["connectorConfigs (web/src/lib/connectors/connectors.tsx)"]
ConnectionConfiguration["ConnectionConfiguration 值、高级值"]
credentialTemplates["credentialTemplates (web/src/lib/connectors/credentials.ts)"]
end
subgraph "后端实现"
DocumentSource["DocumentSource 枚举 (backend/onyx/configs/constants.py)"]
CONNECTOR_CLASS_MAP["CONNECTOR_CLASS_MAP (backend/onyx/connectors/registry.py)"]
identify_connector_class["identify_connector_class() (backend/onyx/connectors/factory.py)"]
BaseConnector["BaseConnector 类 (backend/onyx/connectors/interfaces.py)"]
end
ValidSources --> SOURCE_METADATA_MAP
ValidSources --> ConfigurableSources
ConfigurableSources --> connectorConfigs
ConfigurableSources --> credentialTemplates
SOURCE_METADATA_MAP --> PartialSourceMetadata
connectorConfigs --> ConnectionConfiguration
ValidSources -.同步.-> DocumentSource
DocumentSource --> CONNECTOR_CLASS_MAP
CONNECTOR_CLASS_MAP --> identify_connector_class
identify_connector_class --> BaseConnector
源文件: web/src/lib/types.ts:466-559, web/src/lib/sources.ts:77-451, web/src/lib/connectors/connectors.tsx:145-148, backend/onyx/connectors/factory.py:91-101
---
数据源分类
数据源按照 SourceCategory 枚举定义的类别进行组织。SOURCE_METADATA_MAP 将每个数据源与其类别、图标、显示名称和文档关联起来。
类别划分
系统根据功能领域对连接器进行分组,以简化管理员的设置体验。
数据源分类映射
graph LR
subgraph "Wiki 与知识库"
confluence["confluence ConfluenceIcon"]
sharepoint["sharepoint SharepointIcon"]
notion["notion NotionIcon"]
bookstack["bookstack BookstackIcon"]
outline["outline OutlineIcon"]
slab["slab SlabIcon"]
guru["guru GuruIcon"]
gitbook["gitbook GitbookIcon"]
document360["document360 Document360Icon"]
discourse["discourse DiscourseIcon"]
coda["coda CodaIcon"]
mediawiki["mediawiki MediaWikiIcon"]
wikipedia["wikipedia WikipediaIcon"]
axero["axero AxeroIcon"]
google_sites["google_sites GoogleSitesIcon"]
drupal_wiki["drupal_wiki DrupalWikiIcon"]
end
subgraph "云存储"
google_drive["google_drive GoogleDriveIcon"]
dropbox["dropbox DropboxIcon"]
s3["s3 S3Icon"]
r2["r2 R2Icon"]
google_cloud_storage["google_cloud_storage GoogleStorageIcon"]
oci_storage["oci_storage OCIStorageIcon"]
egnyte["egnyte EgnyteIcon"]
end
subgraph "工单与任务"
jira["jira JiraIcon"]
zendesk["zendesk ZendeskIcon"]
linear["linear LinearIcon"]
asana["asana AsanaIcon"]
clickup["clickup ClickupIcon"]
freshdesk["freshdesk FreshdeskIcon"]
airtable["airtable AirtableIcon"]
productboard["productboard ProductboardIcon"]
testrail["testrail TestRailIcon"]
end
subgraph "消息"
slack["slack ColorSlackIcon"]
teams["teams TeamsIcon"]
gmail["gmail GmailIcon"]
discord["discord ColorDiscordIcon"]
zulip["zulip ZulipIcon"]
imap["imap EmailIcon"]
xenforo["xenforo XenforoIcon"]
end
subgraph "代码仓库"
github["github GithubIcon"]
gitlab["gitlab GitlabIcon"]
bitbucket["bitbucket BitbucketIcon"]
end
subgraph "销售"
salesforce["salesforce SalesforceIcon"]
hubspot["hubspot HubSpotIcon"]
gong["gong GongIcon"]
fireflies["fireflies FirefliesIcon"]
highspot["highspot HighspotIcon"]
loopio["loopio LoopioIcon"]
end
subgraph "其他"
web["web SvgGlobe"]
file["file SvgFileText"]
end
源文件: web/src/lib/sources.ts:95-451, web/src/components/icons/icons.tsx:1-97
---
连接器配置
每个可配置的数据源在 connectorConfigs 中都有一个条目,定义了设置连接器所需的字段。配置使用类型安全的模式,并通过 Yup 进行校验。
配置模式
ConnectionConfiguration 接口定义了如何为每个数据源生成管理表单。
配置对象结构
graph TB
subgraph "ConnectionConfiguration 模式"
description["description: string"]
subtext["subtext?: string"]
initialConnectorName["initialConnectorName?: string"]
values["values: Option[]"]
advanced_values["advanced_values: Option[]"]
overrideDefaultFreq["overrideDefaultFreq?: number"]
end
subgraph "选项类型"
TextOption["TextOption 类型: 'text'"]
SelectOption["SelectOption 类型: 'select'"]
MultiSelectOption["MultiSelectOption 类型: 'multiselect'"]
ListOption["ListOption 类型: 'list'"]
BooleanOption["BooleanOption 类型: 'checkbox'"]
NumberOption["NumberOption 类型: 'number'"]
FileOption["FileOption 类型: 'file'"]
TabOption["TabOption 类型: 'tab'"]
end
values --> TextOption
values --> SelectOption
values --> ListOption
values --> BooleanOption
values --> NumberOption
values --> TabOption
values --> FileOption
源文件: web/src/lib/connectors/connectors.tsx:114-143, web/src/lib/connectors/connectors.tsx:17-112
---
实现细节:Google Drive
Google Drive 连接器支持递归文件夹遍历、权限同步和多种文件类型。
数据流:文件检索
连接器使用 crawl_folders_for_files backend/onyx/connectors/google_drive/file_retrieval.py:36 来遍历层级结构,并根据凭证类型使用 get_all_files_for_oauth backend/onyx/connectors/google_drive/file_retrieval.py:38 或 get_all_files_in_my_drive_and_shared backend/onyx/connectors/google_drive/file_retrieval.py:40-41。
Google Drive 遍历逻辑
graph TD
Connector["GoogleDriveConnector (backend/onyx/connectors/google_drive/connector.py)"]
Crawl["crawl_folders_for_files (file_retrieval.py)"]
DriveService["GoogleDriveService (google_utils/resources.py)"]
subgraph "检索管线"
FetchFiles["execute_paginated_retrieval (google_utils/google_utils.py)"]
Convert["convert_drive_item_to_document (doc_conversion.py)"]
Sections["get_document_sections (section_extraction.py)"]
end
Connector --> Crawl
Crawl --> FetchFiles
FetchFiles --> DriveService
FetchFiles --> Convert
Convert --> Sections
源文件: backend/onyx/connectors/google_drive/connector.py:36-49, backend/onyx/connectors/google_drive/doc_conversion.py:33-34, backend/onyx/connectors/google_drive/file_retrieval.py:106-128
文档转换
文件通过 convert_drive_item_to_document backend/onyx/connectors/google_drive/doc_conversion.py:33 转换为 Onyx 的 Document 对象。对于 Google Docs,连接器使用 get_document_sections backend/onyx/connectors/google_drive/doc_conversion.py:27 提取章节。对于二进制文件(PDF、DOCX、PPTX),它使用 MediaIoBaseDownload backend/onyx/connectors/google_drive/doc_conversion.py:10 下载内容,并使用本地提取器(如 read_pdf_file backend/onyx/connectors/google_drive/doc_conversion.py:43)进行处理。
---
实现细节:Confluence
Confluence 连接器同时支持 Cloud 版和 Server/Data Center 版。它使用 OnyxConfluence backend/onyx/connectors/confluence/onyx_confluence.py:110,这是对 atlassian-python-api 库的封装。
检查点
ConfluenceConnector 实现了 CheckpointedConnector backend/onyx/connectors/confluence/connector.py:121。它将 next_page_url 存储在 ConfluenceCheckpoint backend/onyx/connectors/confluence/connector.py:108-109 中,以便从中断处恢复索引。
CQL 过滤
连接器构建复杂的 CQL(Confluence 查询语言)字符串,以按空间、页面 ID 或标签进行过滤 backend/onyx/connectors/confluence/connector.py:170-181。
源文件: backend/onyx/connectors/confluence/connector.py:120-154, backend/onyx/connectors/confluence/onyx_confluence.py:110-157
---
实现细节:SharePoint
SharePoint 连接器使用 Microsoft Graph API 和 office365-rest-python-client 库。它支持索引文档检索和权限同步。
认证
SharePoint 支持多种认证方式,包括客户端密钥和基于证书的认证。load_credentials 方法 backend/onyx/connectors/sharepoint/connector.py:221-255 负责初始化 msal.ConfidentialClientApplication backend/onyx/connectors/sharepoint/connector.py:21 和 GraphClient backend/onyx/connectors/sharepoint/connector.py:26。
文档处理
连接器遍历 SharePoint 站点和驱动器,使用 DriveItemData.from_graph_json backend/onyx/connectors/sharepoint/connector.py:174 获取项目。它使用 extract_text_and_images backend/onyx/connectors/sharepoint/connector.py:76 处理文件内容,并使用 get_sharepoint_external_access backend/onyx/connectors/sharepoint/connector.py:74 进行权限映射。
源文件: backend/onyx/connectors/sharepoint/connector.py:21-34, backend/onyx/connectors/sharepoint/connector.py:155-172, backend/onyx/connectors/sharepoint/connector.py:221-255
---
实现细节:Slack
Slack 连接器索引公共和私有频道中的消息和线程。
消息检索
它使用 OnyxSlackWebClient backend/onyx/connectors/slack/connector.py:63 与 Slack API 交互。get_channel_messages 函数 backend/onyx/connectors/slack/connector.py:146 执行分页调用 conversations_history,而 get_thread backend/onyx/connectors/slack/connector.py:176 检索特定消息的回复。
权限同步
Slack 权限通过 get_channel_access backend/onyx/connectors/slack/connector.py:58 进行同步,该方法将 Slack 用户 ID 映射到外部访问记录。
源文件: backend/onyx/connectors/slack/connector.py:146-174, backend/onyx/connectors/slack/connector.py:176-184
---
实现细节:Salesforce
Salesforce 连接器通过将数据导出到本地 SQLite 数据库进行处理,执行全量同步和增量同步。
同步策略
连接器使用 OnyxSalesforce backend/onyx/connectors/salesforce/onyx_salesforce.py:30 进行 API 交互。在初始同步期间,它通过 fetch_all_csvs_in_parallel backend/onyx/connectors/salesforce/connector.py:31 批量导出对象类型到 CSV,并将其加载到 OnyxSalesforceSQLite backend/onyx/connectors/salesforce/connector.py:32 中。
文档生成
文档通过将父对象与其子对象(例如,账户与机会)在本地数据库中进行关联来创建 backend/onyx/connectors/salesforce/connector.py:172-180。
源文件: backend/onyx/connectors/salesforce/connector.py:163-182, backend/onyx/connectors/salesforce/doc_conversion.py:27-28
---
后端连接器注册表
后端使用工厂模式,根据 DocumentSource 实例化正确的连接器类。
源文件: backend/onyx/connectors/factory.py:1-185
连接器类加载
identify_connector_class 函数 backend/onyx/connectors/factory.py:91-101 从 registry.py 中定义的 CONNECTOR_CLASS_MAP 中检索类。它使用 _load_connector_class backend/onyx/connectors/factory.py:36-54 动态导入模块并缓存类对象。
输入类型校验
在实例化之前,工厂会校验连接器类是否为其 InputType(例如,LoadConnector 对应 LOAD_STATE)实现了所需的接口 backend/onyx/connectors/factory.py:57-88。