agentic_huge_data_base / wiki
页面 Dify · 3.3 存储后端与向量数据库配置·DeepWiki 中文全文译文

3.3 · 存储后端与向量数据库配置(Storage Backends and Vector Database Configuration)

应用编排与外部知识接入 · 本章是 Dify DeepWiki 中文译文的独立章节页,保留原始链接、源码锚点、模块标签和章节层级。

项目Dify 章节3.3 状态全文译文 模块检索、召回与索引、系统架构、测试、发布与运维、存储与持久化
源码线索
  • api/.env.example
  • api/app.py
  • api/app_factory.py
  • api/configs/feature/__init__.py
  • api/configs/middleware/__init__.py
  • api/configs/observability/__init__.py
  • api/configs/observability/otel/otel_config.py
  • api/configs/packaging/__init__.py
  • api/controllers/console/datasets/datasets.py
  • api/core/plugin/backwards_invocation/model.py
模块标签
  • 检索、召回与索引
  • 系统架构
  • 测试、发布与运维
  • 存储与持久化
  • 配置治理

中文译文

存储后端与向量数据库配置(中文译文)

原始 DeepWiki 页面:https://deepwiki.com/langgenius/dify/3.3-storage-backends-and-vector-database-configuration
翻译时间:2026-05-27T08:44:38.690Z
翻译模型:deepseek-chat
原文字符数:17622
项目:Dify (dify)

---

存储后端与向量数据库配置

相关源文件

以下文件被用作生成此 Wiki 页面的上下文:

  • api/.env.example
  • api/app.py
  • api/app_factory.py
  • api/configs/feature/__init__.py
  • api/configs/middleware/__init__.py
  • api/configs/observability/__init__.py
  • api/configs/observability/otel/otel_config.py
  • api/configs/packaging/__init__.py
  • api/controllers/console/datasets/datasets.py
  • api/core/plugin/backwards_invocation/model.py
  • api/core/rag/datasource/keyword/jieba/jieba.py
  • api/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py
  • api/core/rag/datasource/vdb/vector_factory.py
  • api/core/rag/datasource/vdb/vector_type.py
  • api/core/rag/retrieval/router/multi_dataset_function_call_router.py
  • api/core/rag/retrieval/router/multi_dataset_react_route.py
  • api/core/rag/splitter/fixed_text_splitter.py
  • api/core/rag/splitter/text_splitter.py
  • api/extensions/ext_compress.py
  • api/extensions/ext_otel.py
  • api/extensions/ext_storage.py
  • api/extensions/otel/instrumentation.py
  • api/extensions/storage/storage_type.py
  • api/factories/variable_factory.py
  • api/providers/vdb/vdb-couchbase/src/dify_vdb_couchbase/couchbase_vector.py
  • api/providers/vdb/vdb-elasticsearch/src/dify_vdb_elasticsearch/elasticsearch_vector.py
  • api/providers/vdb/vdb-huawei-cloud/src/dify_vdb_huawei_cloud/huawei_cloud_vector.py
  • api/providers/vdb/vdb-lindorm/src/dify_vdb_lindorm/lindorm_vector.py
  • api/providers/vdb/vdb-milvus/src/dify_vdb_milvus/milvus_vector.py
  • api/providers/vdb/vdb-opensearch/src/dify_vdb_opensearch/opensearch_vector.py
  • api/providers/vdb/vdb-oracle/src/dify_vdb_oracle/oraclevector.py
  • api/providers/vdb/vdb-pgvector/src/dify_vdb_pgvector/pgvector.py
  • api/providers/vdb/vdb-relyt/src/dify_vdb_relyt/relyt_vector.py
  • api/providers/vdb/vdb-tablestore/src/dify_vdb_tablestore/tablestore_vector.py
  • api/providers/vdb/vdb-tidb-vector/src/dify_vdb_tidb_vector/tidb_vector.py
  • api/providers/vdb/vdb-upstash/src/dify_vdb_upstash/upstash_vector.py
  • api/providers/vdb/vdb-vastbase/src/dify_vdb_vastbase/vastbase_vector.py
  • api/pyproject.toml
  • api/tests/unit_tests/configs/test_dify_config.py
  • api/tests/unit_tests/core/rag/splitter/__init__.py
  • api/tests/unit_tests/core/rag/splitter/test_text_splitter.py
  • api/tests/unit_tests/core/workflow/graph_engine/test_table_runner.py
  • api/uv.lock
  • docker/.env.example
  • docker/README.md
  • docker/docker-compose-template.yaml
  • docker/docker-compose.middleware.yaml
  • docker/docker-compose.yaml
  • docker/envs/core-services/shared.env.example
  • docker/envs/infrastructure/nginx.env.example
  • docker/envs/security.env.example
  • docker/nginx/conf.d/default.conf.template
  • web/app/components/app/configuration/config-var/index.tsx
  • web/app/components/app/configuration/config-var/var-item.tsx
  • web/app/components/workflow/nodes/_base/components/variable/__tests__/output-var-list.spec.tsx
  • web/app/components/workflow/nodes/_base/components/variable/output-var-list.tsx
  • web/app/components/workflow/nodes/_base/components/variable/var-list.tsx
  • web/app/components/workflow/nodes/_base/hooks/use-output-var-list.ts
  • web/app/components/workflow/nodes/loop/components/loop-variables/item.tsx
  • web/app/components/workflow/nodes/start/components/var-item.tsx
  • web/app/components/workflow/nodes/start/components/var-list.tsx
  • web/app/components/workflow/nodes/variable-assigner/components/var-group-item.tsx
  • web/app/components/workflow/nodes/variable-assigner/components/var-list/index.tsx
  • web/app/components/workflow/panel/chat-variable-panel/components/variable-modal.tsx
  • web/app/components/workflow/panel/chat-variable-panel/type.ts
  • web/app/components/workflow/panel/env-panel/variable-modal.tsx
  • web/package.json
  • web/utils/var.ts

目的与范围

本文档描述了 Dify 的存储后端配置(用于文件存储)和向量数据库配置(用于知识库嵌入向量)。内容涵盖系统架构、支持的存储后端(23 种以上向量数据库和 12 种以上存储提供商)、配置方法,以及用于在运行时初始化这些系统的工厂模式。

Dify 对文件存储和向量搜索均采用可插拔架构,开发者可以通过修改环境变量来切换提供商 docker/.env.example:151-207

---

存储后端架构

概述

Dify 使用可插拔的存储后端系统来存储用户上传的文件、文档和生成的资源。该系统通过统一接口支持多个云提供商和本地存储,其中 Apache OpenDAL 作为主要的抽象层 api/.env.example:111-115

存储后端选择流程

graph TB
    EnvConfig["STORAGE_TYPE<br/>环境变量"]

    MiddlewareConfig["StorageConfig<br/>[api/configs/middleware/__init__.py]"]

    StorageFactory["Storage.get_storage_factory()<br/>[api/extensions/ext_storage.py]"]

    StorageType{StorageType 枚举}

    OpenDALStorage["OpenDALStorage<br/>[api/extensions/storage/opendal_storage.py]"]
    S3Storage["AwsS3Storage"]
    AzureBlobStorage["AzureBlobStorage"]
    AliyunOssStorage["AliyunOssStorage"]
    TencentCosStorage["TencentCosStorage"]
    GoogleCloudStorage["GoogleCloudStorage"]
    HuaweiObsStorage["HuaweiObsStorage"]
    BaiduObsStorage["BaiduObsStorage"]
    VolcengineTosStorage["VolcengineTosStorage"]
    OciStorage["OracleOCIStorage"]
    SupabaseStorage["SupabaseStorage"]
    ClickZettaVolumeStorage["ClickZettaVolumeStorage"]

    EnvConfig --> MiddlewareConfig
    MiddlewareConfig --> StorageFactory
    StorageFactory --> StorageType

    StorageType -->|"StorageType.OPENDAL"| OpenDALStorage
    StorageType -->|"StorageType.S3"| S3Storage
    StorageType -->|"StorageType.AZURE_BLOB"| AzureBlobStorage
    StorageType -->|"StorageType.ALIYUN_OSS"| AliyunOssStorage
    StorageType -->|"StorageType.LOCAL"| OpenDALStorage
    StorageType -->|"StorageType.GOOGLE_STORAGE"| GoogleCloudStorage
    StorageType -->|"StorageType.TENCENT_COS"| TencentCosStorage
    StorageType -->|"StorageType.CLICKZETTA_VOLUME"| ClickZettaVolumeStorage

    OpenDALStorage --> OpenDALScheme["OpenDAL 方案选择"]
    OpenDALScheme --> FS["opendal.Operator(scheme='fs')"]
    OpenDALScheme --> S3Op["opendal.Operator(scheme='s3')"]
    OpenDALScheme --> OtherSchemes["OpenDAL 方案 (oss, cos, obs 等)"]

来源:api/extensions/ext_storage.py:22-86api/extensions/storage/storage_type.py:4-19api/configs/middleware/__init__.py:70-77

存储配置与类型

存储系统使用基于 Pydantic 的配置模型(位于 api/configs/middleware/storage/),用于校验和解析环境变量。api/extensions/ext_storage.py 中的 Storage 类作为入口点,通过工厂模式实例化具体的提供商。

STORAGE_TYPE 值提供商类实现配置文件
opendalApache OpenDALOpenDALStorageopendal_storage_config.py
s3AWS S3AwsS3Storageamazon_s3_storage_config.py
azure-blobAzureAzureBlobStorageazure_blob_storage_config.py
aliyun-oss阿里云AliyunOssStoragealiyun_oss_storage_config.py
google-storageGoogle CloudGoogleCloudStoragegoogle_cloud_storage_config.py
tencent-cos腾讯云TencentCosStoragetencent_cos_storage_config.py
huawei-obs华为云HuaweiObsStoragehuawei_obs_storage_config.py
baidu-obs百度云BaiduObsStoragebaidu_obs_storage_config.py
volcengine-tos火山引擎VolcengineTosStoragevolcengine_tos_storage_config.py
oci-storageOracle CloudOracleOCIStorageoci_storage_config.py
supabaseSupabaseSupabaseStoragesupabase_storage_config.py
clickzetta-volumeClickZettaClickZettaVolumeStorageclickzetta_volume_storage_config.py
local本地文件系统(已废弃)OpenDALStorage(scheme='fs')-

来源:api/extensions/ext_storage.py:22-86api/extensions/storage/storage_type.py:4-19api/configs/middleware/__init__.py:53-67

OpenDAL 集成

OpenDAL 为 40 多种存储服务提供了统一接口。当 STORAGE_TYPE=opendal 时,通过 OPENDAL_SCHEME 选择方案 api/.env.example:111-115

OpenDAL 初始化模式

OpenDALStorage 类使用重试层和从环境变量中提取的动态关键字参数来初始化 opendal.Operator

# api/extensions/storage/opendal_storage.py
class OpenDALStorage(BaseStorage):
    def __init__(self, scheme: str, **kwargs):
        # OpenDAL Operator 的初始化逻辑
        # 使用 opendal.layers.RetryLayer

系统会解析以 OPENDAL_<SCHEME>_ 开头的环境变量,并将其转换为小写键名,用于 OpenDAL 操作器。

来源:api/pyproject.toml:191api/configs/middleware/storage/opendal_storage_config.pyapi/.env.example:113-115

---

向量数据库架构

概述

Dify 支持 23 种以上向量数据库实现。每种实现都注册为入口点,并通过 VectorFactory 进行实例化 api/core/rag/datasource/vdb/vector_factory.py

向量数据库初始化流程

graph TB
    Dataset["数据集模型<br/>[api/models/dataset.py]"]

    VectorFactory["VectorFactory.get_vector()<br/>[api/core/rag/datasource/vdb/vector_factory.py]"]

    VectorType{VectorType 枚举}

    Weaviate["WeaviateVector"]
    Milvus["MilvusVector"]
    PGVector["PGVector"]
    Qdrant["QdrantVector"]
    Elasticsearch["ElasticSearchVector"]
    TiDB["TiDBVector"]
    MyScale["MyScaleVector"]
    Tencent["TencentVector"]
    Oracle["OracleVector"]
    Relyt["RelytVector"]

    Dataset --> VectorFactory
    VectorFactory --> VectorType

    VectorType -->|"VectorType.WEAVIATE"| Weaviate
    VectorType -->|"VectorType.MILVUS"| Milvus
    VectorType -->|"VectorType.PGVECTOR"| PGVector
    VectorType -->|"VectorType.QDRANT"| Qdrant
    VectorType -->|"VectorType.ELASTICSEARCH"| Elasticsearch
    VectorType -->|"VectorType.TIDB_VECTOR"| TiDB
    VectorType -->|"VectorType.MYSCALE"| MyScale
    VectorType -->|"VectorType.TENCENT"| Tencent
    VectorType -->|"VectorType.ORACLE"| Oracle
    VectorType -->|"VectorType.RELYT"| Relyt

来源:api/core/rag/datasource/vdb/vector_type.py:4-37api/configs/middleware/__init__.py:86-101api/pyproject.toml:203-241

支持的实现

Dify 采用基于工作区的插件架构来管理向量数据库,每个提供商都是 providers/vdb/* 下的独立包 api/pyproject.toml:56-58

数据库包名配置类
Weaviatedify-vdb-weaviateWeaviateConfig
Milvusdify-vdb-milvusMilvusConfig
PGVectordify-vdb-pgvectorPGVectorConfig
Qdrantdify-vdb-qdrantQdrantConfig
Elasticsearchdify-vdb-elasticsearchElasticsearchConfig
TiDB Vectordify-vdb-tidb-vectorTiDBVectorConfig
OceanBasedify-vdb-oceanbaseOceanBaseVectorConfig
Chromadify-vdb-chromaChromaConfig
Oracledify-vdb-oracleOracleConfig

来源:api/pyproject.toml:62-91api/configs/middleware/__init__.py:22-51

配置模式

向量数据库通过 VECTOR_STORE 环境变量进行配置 api/.env.example:205。每个数据库都有自己特定的配置块:

  • WeaviateWEAVIATE_ENDPOINTWEAVIATE_API_KEY api/.env.example:210-211
  • MilvusMILVUS_URIMILVUS_TOKEN docker/.env.example:186
  • PGVectorPGVECTOR_HOSTPGVECTOR_PORT docker/docker-compose.yaml:32

---

数据流:从文档到向量存储

下图展示了从高层文档入库到负责持久化的具体代码实体之间的桥梁。

graph LR
    subgraph "自然语言空间"
        Doc["用户文档 (PDF/TXT)"]
    end

    subgraph "代码实体空间"
        DS["数据集实体<br/>[api/models/dataset.py]"]
        IR["IndexingRunner<br/>[api/core/indexing_runner.py]"]
        VF["VectorFactory<br/>[api/core/rag/datasource/vdb/vector_factory.py]"]
        VImpl["BaseVector 实现<br/>(例如 PGVector)"]
    end

    subgraph "基础设施空间"
        VDB_Server["向量数据库实例<br/>(Milvus/Qdrant 等)"]
    end

    Doc --> IR
    IR --> DS
    DS --> VF
    VF --> VImpl
    VImpl -->|"insert()"| VDB_Server

来源:api/core/rag/datasource/vdb/vector_factory.pyapi/controllers/console/datasets/datasets.py:25api/models/dataset.py

---

配置汇总表

中间件配置类默认端口关键环境变量
PostgreSQLDatabaseConfig5432DB_HOSTDB_USERNAMEDB_PASSWORDDB_DATABASE
RedisRedisConfig6379REDIS_HOSTREDIS_PORTREDIS_PASSWORD
S3S3StorageConfig443S3_ENDPOINTS3_BUCKET_NAMES3_ACCESS_KEY
MilvusMilvusConfig19530MILVUS_URIMILVUS_TOKEN
WeaviateWeaviateConfig8080WEAVIATE_ENDPOINTWEAVIATE_API_KEY
QdrantQdrantConfig6333QDRANT_URLQDRANT_API_KEY

来源:api/configs/middleware/__init__.py:123-153api/configs/middleware/cache/redis_config.pyapi/.env.example:46-101api/.env.example:117-124api/.env.example:209-211