向量数据库集成架构(中文译文)
原始 DeepWiki 页面:https://deepwiki.com/langgenius/dify/4.4-vector-database-integration-architecture
翻译时间:2026-05-27T08:44:32.517Z
翻译模型:deepseek-chat
原文字符数:14543
项目:Dify (dify)
---
向量数据库集成架构
相关源文件
以下文件被用作生成此维基页面的上下文:
api/.env.exampleapi/app.pyapi/app_factory.pyapi/configs/feature/__init__.pyapi/configs/middleware/__init__.pyapi/configs/observability/__init__.pyapi/configs/observability/otel/otel_config.pyapi/configs/packaging/__init__.pyapi/controllers/console/datasets/datasets.pyapi/core/plugin/backwards_invocation/model.pyapi/core/rag/datasource/keyword/jieba/jieba.pyapi/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.pyapi/core/rag/datasource/vdb/vector_factory.pyapi/core/rag/datasource/vdb/vector_type.pyapi/core/rag/retrieval/router/multi_dataset_function_call_router.pyapi/core/rag/retrieval/router/multi_dataset_react_route.pyapi/core/rag/splitter/fixed_text_splitter.pyapi/core/rag/splitter/text_splitter.pyapi/extensions/ext_compress.pyapi/extensions/ext_otel.pyapi/extensions/otel/instrumentation.pyapi/providers/vdb/vdb-couchbase/src/dify_vdb_couchbase/couchbase_vector.pyapi/providers/vdb/vdb-elasticsearch/src/dify_vdb_elasticsearch/elasticsearch_vector.pyapi/providers/vdb/vdb-huawei-cloud/src/dify_vdb_huawei_cloud/huawei_cloud_vector.pyapi/providers/vdb/vdb-lindorm/src/dify_vdb_lindorm/lindorm_vector.pyapi/providers/vdb/vdb-milvus/src/dify_vdb_milvus/milvus_vector.pyapi/providers/vdb/vdb-opensearch/src/dify_vdb_opensearch/opensearch_vector.pyapi/providers/vdb/vdb-oracle/src/dify_vdb_oracle/oraclevector.pyapi/providers/vdb/vdb-pgvector/src/dify_vdb_pgvector/pgvector.pyapi/providers/vdb/vdb-relyt/src/dify_vdb_relyt/relyt_vector.pyapi/providers/vdb/vdb-tablestore/src/dify_vdb_tablestore/tablestore_vector.pyapi/providers/vdb/vdb-tidb-vector/src/dify_vdb_tidb_vector/tidb_vector.pyapi/providers/vdb/vdb-upstash/src/dify_vdb_upstash/upstash_vector.pyapi/providers/vdb/vdb-vastbase/src/dify_vdb_vastbase/vastbase_vector.pyapi/pyproject.tomlapi/tests/unit_tests/configs/test_dify_config.pyapi/tests/unit_tests/core/rag/splitter/__init__.pyapi/tests/unit_tests/core/rag/splitter/test_text_splitter.pyapi/tests/unit_tests/core/workflow/graph_engine/test_table_runner.pyapi/uv.lockdocker/.env.exampledocker/README.mddocker/docker-compose-template.yamldocker/docker-compose.middleware.yamldocker/docker-compose.yamldocker/envs/core-services/shared.env.exampledocker/envs/infrastructure/nginx.env.exampledocker/envs/security.env.exampledocker/nginx/conf.d/default.conf.templateweb/package.json
目的与范围
本文档描述了 Dify 中的向量数据库集成架构,包括支持 23 种以上不同向量数据库实现的 VectorFactory 抽象、配置驱动的选择机制,以及向量数据库如何与文档索引和检索管线集成。
关于文档如何被索引到向量数据库的信息,请参阅文档索引管线。关于检索过程中如何使用向量搜索的信息,请参阅检索策略与元数据过滤。关于部署层面的存储和向量数据库配置,请参阅存储后端与向量数据库配置。
---
架构总览
向量数据库集成架构使用工厂模式来抽象各个向量数据库实现的具体细节,使系统仅通过配置即可在不同向量数据库之间切换。
系统上下文图
graph TB
subgraph "应用层"
IndexingRunner["IndexingRunner<br/>(core.indexing_runner)"]
RetrievalService["RetrievalService<br/>(core.rag.retrieval)"]
DatasetService["DatasetService<br/>(services.dataset_service)"]
end
subgraph "向量抽象层"
VectorFactory["VectorFactory<br/>(core.rag.datasource.vdb.vector_factory)"]
BaseVector["BaseVector<br/>(抽象基类)"]
end
subgraph "配置层"
DifyConfig["dify_config<br/>(VECTOR_STORE)"]
VDBConfigs["向量数据库特定配置<br/>(WeaviateConfig, MilvusConfig 等)"]
end
subgraph "向量数据库实现"
Weaviate["WeaviateVector"]
Milvus["MilvusVector"]
PGVector["PGVector"]
Qdrant["QdrantVector"]
TidbVector["TiDBVector"]
Others["... 18 种以上"]
end
subgraph "外部服务"
WeaviateDB[("Weaviate<br/>服务")]
MilvusDB[("Milvus<br/>服务")]
PostgresDB[("PostgreSQL<br/>带 pgvector")]
QdrantDB[("Qdrant<br/>服务")]
end
IndexingRunner --> VectorFactory
RetrievalService --> VectorFactory
DatasetService --> VectorFactory
VectorFactory --> DifyConfig
VectorFactory --> VDBConfigs
BaseVector <|-- Weaviate
BaseVector <|-- Milvus
BaseVector <|-- PGVector
BaseVector <|-- Qdrant
BaseVector <|-- TidbVector
Weaviate --> WeaviateDB
Milvus --> MilvusDB
PGVector --> PostgresDB
Qdrant --> QdrantDB
来源: api/core/rag/datasource/vdb/vector_factory.py:1-30, api/core/rag/datasource/vdb/vector_type.py:4-30, api/configs/middleware/__init__.py:86-101
---
向量工厂模式
VectorFactory 作为创建向量数据库实例的入口点。它根据 VECTOR_STORE 环境变量或特定数据集配置来决定实例化哪个实现。
工厂类结构
classDiagram
class VectorFactory {
+get_vector(dataset, embeddings) BaseVector
}
class BaseVector {
<<abstract>>
+create(texts, embeddings)
+add_texts(documents, embeddings)
+delete_by_ids(ids)
+delete_by_metadata_field(key, value)
+search_by_vector(query_vector) list
+search_by_full_text(query) list
+get_type() str
}
class VectorType {
<<enumeration>>
WEAVIATE
QDRANT
MILVUS
PGVECTOR
ELASTICSEARCH
CHROMA
OPENSEARCH
TIDB_VECTOR
+15 种以上类型
}
class WeaviateVector {
-client WeaviateClient
+create(texts, embeddings)
+search_by_vector(query_vector)
}
class MilvusVector {
-client MilvusClient
+create(texts, embeddings)
+add_texts(documents, embeddings)
}
VectorFactory ..> BaseVector : 创建
BaseVector <|-- WeaviateVector
BaseVector <|-- MilvusVector
来源: api/core/rag/datasource/vdb/vector_factory.py:1-30, api/core/rag/datasource/vdb/vector_type.py:4-30, api/providers/vdb/vdb-milvus/src/dify_vdb_milvus/milvus_vector.py:1-100
---
配置系统
向量数据库的选择和配置由 Pydantic 模型和环境变量驱动,允许在部署时进行选择而无需修改代码。
配置模型
每个向量数据库实现都有对应的配置模型来校验设置。
| 实现 | 配置类 | 关键字段 | 来源 |
|---|---|---|---|
| Weaviate | WeaviateConfig | WEAVIATE_ENDPOINT, WEAVIATE_API_KEY | api/configs/middleware/vdb/weaviate_config.py:1-15 |
| Milvus | MilvusConfig | MILVUS_URI, MILVUS_TOKEN, MILVUS_DATABASE | api/configs/middleware/__init__.py:34-34 |
| PGVector | PGVectorConfig | PGVECTOR_HOST, PGVECTOR_PORT, PGVECTOR_USER | api/configs/middleware/__init__.py:40-40 |
| Qdrant | QdrantConfig | QDRANT_ENDPOINT, QDRANT_API_KEY | api/configs/middleware/__init__.py:42-42 |
| TiDB Vector | TiDBVectorConfig | TIDB_VECTOR_HOST, TIDB_VECTOR_PORT | api/configs/middleware/__init__.py:47-47 |
选择逻辑
选择主要由环境中的 VECTOR_STORE 变量控制。
# VECTOR_STORE 支持的取值
# weaviate, oceanbase, qdrant, milvus, myscale, relyt, pgvector, pgvecto-rs, chroma, opensearch, oracle, tencent, elasticsearch, elasticsearch-ja, analyticdb, couchbase, vikingdb, opengauss, tablestore, vastbase, tidb, tidb_on_qdrant, baidu, lindorm, huawei_cloud, upstash, matrixone, hologres
来源: api/.env.example:204-205, api/configs/middleware/__init__.py:86-101
---
支持的向量数据库实现
Dify 通过使用 uv 的类似插件的工作区结构,支持超过 23 种向量数据库实现。
实现矩阵(部分列表)
| VectorType | 代码标识符 | 工作区包 |
|---|---|---|
WEAVIATE | weaviate | dify-vdb-weaviate |
MILVUS | milvus | dify-vdb-milvus |
PGVECTOR | pgvector | dify-vdb-pgvector |
QDRANT | qdrant | dify-vdb-qdrant |
ORACLE | oracle | dify-vdb-oracle |
COUCHBASE | couchbase | dify-vdb-couchbase |
TIDB | tidb_vector | dify-vdb-tidb-vector |
来源: api/pyproject.toml:62-91, api/uv.lock:22-51
---
向量数据库生命周期
数据流:将文档索引到向量空间
此图展示了由 IndexingRunner 处理的索引阶段中,从自然语言处理到代码级向量存储的桥梁。
sequenceDiagram
participant IR as "IndexingRunner (core.indexing_runner)"
participant Embed as "嵌入向量(核心)"
participant VF as "VectorFactory (core.rag.datasource.vdb.vector_factory)"
participant VDB as "MilvusVector/PGVector"
IR->>Embed: embed_documents(texts)
Embed-->>IR: list[list[float]]
IR->>VF: get_vector(dataset, embeddings)
VF->>VDB: create(texts, embeddings)
Note over VDB: 准备 batch_insert_list
VDB->>VDB: add_texts(documents, embeddings)
VDB-->>IR: list[str](主键)
来源: api/core/rag/datasource/vdb/vector_factory.py:10-25, api/providers/vdb/vdb-milvus/src/dify_vdb_milvus/milvus_vector.py:115-130
数据流:从自然语言到向量搜索的查询
此流程展示了自然语言查询如何在检索逻辑中被转换和搜索。
sequenceDiagram
participant User as "用户查询"
participant RS as "检索逻辑"
participant VDB as "BaseVector(实现)"
participant Client as "数据库客户端(pymilvus/psycopg2)"
User->>RS: "什么是 Dify?"
RS->>RS: 生成 query_vector
RS->>VDB: search_by_vector(query_vector, top_k=4)
VDB->>Client: 执行搜索(例如 Milvus search())
Client-->>VDB: 原始结果(ID、分数)
VDB->>VDB: 映射到 Document 模型
VDB-->>RS: list[Document]
来源: api/providers/vdb/vdb-milvus/src/dify_vdb_milvus/milvus_vector.py:196-220, api/providers/vdb/vdb-oracle/src/dify_vdb_oracle/oraclevector.py:150-180
---
基础设施与部署
向量数据库通常作为边车容器或外部服务部署。
Docker Compose 集成
docker-compose.yaml 和 docker-compose-template.yaml 文件定义了多个支持的向量数据库的编排。
| 服务名称 | 镜像 | 默认端口 |
|---|---|---|
weaviate | semitechnologies/weaviate:1.19.0 | 8080 |
milvus-standalone | milvusdb/milvus:v2.3.1 | 19530 |
qdrant | qdrant/qdrant:v1.7.3 | 6333 |
来源: docker/docker-compose.yaml:22-55, docker/docker-compose-template.yaml:16-49
配置环境文件
Dify 使用结构化的环境文件系统来管理不同向量存储的配置。
envs/vectorstores/weaviate.envenvs/vectorstores/qdrant.envenvs/vectorstores/milvus.envenvs/vectorstores/pgvector.env
来源: docker/docker-compose.yaml:22-55, docker/docker-compose-template.yaml:16-49