检索系统(中文译文)
原始 DeepWiki 页面:https://deepwiki.com/mayan-edms/Mayan-EDMS/5.1-search-system
翻译时间:2026-05-27T08:44:38.105Z
翻译模型:deepseek-chat
原文字符数:19819
项目:Mayan EDMS (mayan-edms)
---
搜索系统
相关源文件
以下文件被用作生成此 Wiki 页面的上下文:
docs/releases/4.1.9.txtmayan/apps/cabinets/tests/test_widgets.pymayan/apps/databases/literals.pymayan/apps/dynamic_search/api_views.pymayan/apps/dynamic_search/backends/django.pymayan/apps/dynamic_search/backends/whoosh.pymayan/apps/dynamic_search/classes.pymayan/apps/dynamic_search/filters.pymayan/apps/dynamic_search/literals.pymayan/apps/dynamic_search/management/commands/search_index_objects.pymayan/apps/dynamic_search/queues.pymayan/apps/dynamic_search/serializers.pymayan/apps/dynamic_search/tasks.pymayan/apps/dynamic_search/templates/dynamic_search/app/list_toolbar.htmlmayan/apps/dynamic_search/templates/dynamic_search/search_box.htmlmayan/apps/dynamic_search/tests/backends.pymayan/apps/dynamic_search/tests/mixins.pymayan/apps/dynamic_search/tests/test_backends.pymayan/apps/dynamic_search/tests/test_classes.pymayan/apps/dynamic_search/tests/test_tasks.pymayan/apps/dynamic_search/tests/test_views.pymayan/apps/dynamic_search/urls.pymayan/apps/dynamic_search/utils.pymayan/apps/dynamic_search/view_mixins.pymayan/apps/dynamic_search/views.pymayan/apps/metadata/tests/test_widgets.pymayan/apps/tags/tests/test_widgets.py
目的与范围
搜索系统为 Mayan EDMS 提供动态搜索能力,使用户能够通过多种搜索后端查找文档、元数据、标签及其他内容。该系统支持简单搜索和高级搜索界面、自动内容索引以及可插拔的后端实现,包括 Whoosh、ElasticSearch 和 Django ORM。
关于文档组织功能,请参见文档组织。关于访问控制集成,请参见访问控制与权限。
架构总览
搜索系统采用可插拔的后端架构,具备自动索引和多种界面选项:
graph TB
subgraph "搜索界面"
WebUI["SearchView<br/>AdvancedSearchView"]
RestAPI["APISearchView<br/>APIAdvancedSearchView"]
Filter["RESTAPISearchFilter"]
end
subgraph "核心搜索框架"
Backend["SearchBackend"]
Model["SearchModel"]
Field["SearchField"]
end
subgraph "后端实现"
Whoosh["WhooshSearchBackend"]
Django["DjangoSearchBackend"]
Elastic["ElasticSearchBackend"]
end
subgraph "索引系统"
Signals["Django 信号"]
Tasks["Celery 任务<br/>task_index_instance<br/>task_deindex_instance"]
Handlers["信号处理器"]
end
subgraph "数据源"
Documents["文档模型"]
Metadata["元数据模型"]
Tags["标签模型"]
Content["OCR 内容"]
end
WebUI --> Backend
RestAPI --> Backend
Filter --> Backend
Backend --> Whoosh
Backend --> Django
Backend --> Elastic
Model --> Field
Backend --> Model
Documents --> Signals
Metadata --> Signals
Tags --> Signals
Content --> Signals
Signals --> Handlers
Handlers --> Tasks
Tasks --> Backend
来源:mayan/apps/dynamic_search/classes.py:38-508, mayan/apps/dynamic_search/views.py:32-179, mayan/apps/dynamic_search/api_views.py:13-103
核心组件
SearchBackend 类
SearchBackend 类为所有搜索实现提供抽象接口:
classDiagram
class SearchBackend {
+get_instance() SearchBackend
+search(query, search_model, user) QuerySet
+index_instance(instance) void
+deindex_instance(instance) void
+reset() void
+_search(query, search_model, user) QuerySet*
+cleanup_query(query, search_model) dict
+decode_query(query) dict
}
class WhooshSearchBackend {
+index_path Path
+writer_kwargs dict
+_search(query, search_model, user) QuerySet
+get_or_create_index(search_model) Index
}
class DjangoSearchBackend {
+_search(query, search_model, user) QuerySet
+get_search_query(query, search_model) SearchQuery
}
SearchBackend <|-- WhooshSearchBackend
SearchBackend <|-- DjangoSearchBackend
后端系统通过 setting_backend 配置支持可插拔实现,默认使用 Whoosh。
来源:mayan/apps/dynamic_search/classes.py:38-508, mayan/apps/dynamic_search/backends/whoosh.py:28-292, mayan/apps/dynamic_search/backends/django.py:15-243
SearchModel 注册
SearchModel 实例定义哪些 Django 模型可搜索,并配置其搜索字段:
graph LR
subgraph "SearchModel 注册表"
Registry["SearchModel._registry<br/>dict"]
All["SearchModel.all()<br/>返回所有模型"]
Get["SearchModel.get(name)<br/>按名称检索"]
end
subgraph "字段配置"
DirectFields["fields_direct<br/>模型自身字段"]
RelatedFields["fields_related<br/>外键字段"]
SearchFields["search_fields_dict<br/>字段定义"]
end
subgraph "模型关系"
ProxyModels["代理模型<br/>_proxies 列表"]
RelatedModels["get_related_models()<br/>反向关系"]
ThroughModels["get_through_models()<br/>多对多关系"]
end
Registry --> DirectFields
Registry --> RelatedFields
DirectFields --> SearchFields
RelatedFields --> SearchFields
SearchFields --> ProxyModels
SearchFields --> RelatedModels
RelatedModels --> ThroughModels
搜索模型通过 AppsModuleLoaderMixin 从每个应用的 search.py 模块自动加载。
来源:mayan/apps/dynamic_search/classes.py:564-862
SearchField 类型
搜索字段支持直接模型字段和通过外键关联的字段:
| 字段类型 | 语法 | 示例 |
|---|---|---|
| 直接字段 | field_name | label, description |
| 外键字段 | fk__field | document_type__label |
| 反向外键 | related__field | files__filename |
| 多对多 | m2m__field | tags__label |
| 深层关系 | fk__fk__field | document__document_type__label |
来源:mayan/apps/dynamic_search/classes.py:510-563
搜索后端实现
Whoosh 后端
Whoosh 后端提供基于文件的全文搜索索引:
graph TB
subgraph "Whoosh 后端组件"
WhooshBackend["WhooshSearchBackend"]
IndexPath["index_path<br/>文件存储位置"]
Schema["get_search_model_schema()<br/>字段映射"]
Storage["FileStorage<br/>whoosh 存储"]
end
subgraph "索引操作"
CreateIndex["get_or_create_index()"]
IndexInstance["index_instance()"]
DeindexInstance["deindex_instance()"]
IndexInstances["index_instances()"]
end
subgraph "搜索操作"
QueryParser["qparser.QueryParser"]
SearchString["搜索字符串构建"]
Results["Whoosh 结果"]
QuerySet["Django QuerySet"]
end
WhooshBackend --> IndexPath
WhooshBackend --> Schema
IndexPath --> Storage
Storage --> CreateIndex
CreateIndex --> IndexInstance
CreateIndex --> DeindexInstance
CreateIndex --> IndexInstances
WhooshBackend --> QueryParser
QueryParser --> SearchString
SearchString --> Results
Results --> QuerySet
Whoosh 使用基于文件的索引,默认存储在 MEDIA_ROOT/whoosh_index/ 中,并通过锁定机制支持并发访问。
来源:mayan/apps/dynamic_search/backends/whoosh.py:28-292
Django 后端
Django 后端使用原生 ORM 查询进行搜索,无需外部依赖:
graph TB
subgraph "Django 搜索组件"
DjangoBackend["DjangoSearchBackend"]
SearchQuery["SearchQuery<br/>构建 Django Q 对象"]
FieldQuery["FieldQuery<br/>逐字段处理"]
TermCollection["SearchTermCollection<br/>解析搜索词"]
end
subgraph "查询构建"
Terms["SearchTerm<br/>单个词项"]
QObjects["Q() 对象<br/>Django ORM"]
Operators["AND/OR 逻辑"]
Negation["NOT 逻辑"]
end
subgraph "词项处理"
Quotes["引号字符串"]
Spaces["空格分隔"]
Meta["元词项 (OR)"]
Negated["否定词项 (-)"]
end
DjangoBackend --> SearchQuery
SearchQuery --> FieldQuery
FieldQuery --> TermCollection
TermCollection --> Terms
Terms --> QObjects
QObjects --> Operators
QObjects --> Negation
Terms --> Quotes
Terms --> Spaces
Terms --> Meta
Terms --> Negated
Django 后端支持高级查询语法,包括引号字符串、否定词项和布尔运算符。
来源:mayan/apps/dynamic_search/backends/django.py:15-243
索引系统
自动索引
搜索系统通过 Django 信号自动维护索引:
sequenceDiagram
participant Model as "Django 模型"
participant Signal as "Django 信号"
participant Handler as "信号处理器"
participant Task as "Celery 任务"
participant Backend as "搜索后端"
Model->>Signal: post_save/pre_delete
Signal->>Handler: handler_index_instance
Handler->>Task: task_index_instance.apply_async()
Task->>Backend: backend.index_instance()
Backend->>Backend: 更新搜索索引
Note over Model,Backend: 模型变更时自动索引
Model->>Signal: m2m_changed (多对多字段)
Signal->>Handler: handler_factory_index_related_instance_m2m
Handler->>Task: task_index_related_instance_m2m.apply_async()
Task->>Backend: backend.index_related_instance_m2m()
信号处理器通过 SearchBackend._enable() 和 SearchBackend._disable() 自动连接/断开。
来源:mayan/apps/dynamic_search/classes.py:85-146, mayan/apps/dynamic_search/tasks.py:22-166
索引任务
搜索索引使用 Celery 任务进行异步处理:
| 任务 | 用途 | 队列 |
|---|---|---|
task_index_instance | 索引单个模型实例 | queue_search |
task_deindex_instance | 从索引中移除实例 | queue_search |
task_index_instances | 批量索引多个实例 | queue_search |
task_reindex_backend | 全部内容重新索引 | queue_search_slow |
task_index_related_instance_m2m | 处理多对多关系变更 | queue_search |
任务包含指数退避重试逻辑,用于处理临时故障。
来源:mayan/apps/dynamic_search/tasks.py:22-166, mayan/apps/dynamic_search/queues.py:6-42
查询处理
查询语法
搜索系统支持带有作用域搜索的高级查询语法:
graph TB
subgraph "查询结构"
RawQuery["原始查询字典"]
DecodedQuery["解码查询<br/>作用域 + 运算符"]
Scopes["作用域<br/>隔离的搜索上下文"]
Operators["运算符<br/>作用域间的 AND/OR/NOT"]
end
subgraph "作用域组件"
ScopeQuery["作用域查询<br/>字段: 值 对"]
MatchAll["match_all<br/>作用域内 AND 与 OR"]
ScopeId["scope_id<br/>0, 1, a, b 等"]
end
subgraph "查询示例"
Simple["q=搜索词"]
Advanced["label=document AND type=pdf"]
Scoped["__0_label=doc __1_type=pdf __operator_0_1=OR_2"]
end
RawQuery --> DecodedQuery
DecodedQuery --> Scopes
DecodedQuery --> Operators
Scopes --> ScopeQuery
Scopes --> MatchAll
Scopes --> ScopeId
Simple --> RawQuery
Advanced --> RawQuery
Scoped --> RawQuery
查询语法支持作用域标记(__)、运算符(__operator_)和结果选择(__result)。
来源:mayan/apps/dynamic_search/classes.py:261-331
权限集成
搜索结果通过 ACL 权限进行过滤:
graph LR
subgraph "搜索流程"
Query["搜索查询"]
Backend["后端搜索"]
Results["原始结果"]
ACL["ACL.restrict_queryset()"]
FilteredResults["过滤后的结果"]
Limit["结果限制"]
end
Query --> Backend
Backend --> Results
Results --> ACL
ACL --> FilteredResults
FilteredResults --> Limit
subgraph "权限检查"
SearchModel["SearchModel.permission"]
UserPerms["用户权限"]
ObjectPerms["对象级 ACL"]
end
SearchModel --> ACL
UserPerms --> ACL
ObjectPerms --> ACL
每个 SearchModel 可以指定所需的权限,结果会通过 ACL 系统进行过滤。
来源:mayan/apps/dynamic_search/classes.py:438-444
Web 界面
搜索视图
Web 界面提供简单搜索和高级搜索两种表单:
graph TB
subgraph "搜索界面视图"
SearchView["SearchView<br/>简单搜索表单"]
AdvancedSearchView["AdvancedSearchView<br/>字段特定搜索"]
ResultsView["ResultsView<br/>分页结果"]
SearchAgainView["SearchAgainView<br/>重定向辅助"]
end
subgraph "表单处理"
SearchForm["SearchForm<br/>q 参数"]
AdvancedSearchForm["AdvancedSearchForm<br/>逐字段输入"]
QueryDict["request.GET.dict()"]
SearchBackend["SearchBackend.search()"]
end
subgraph "URL 模式"
SearchURL["search/<search_model_pk>/"]
AdvancedURL["advanced/<search_model_pk>/"]
ResultsURL["results/<search_model_pk>/"]
end
SearchView --> SearchForm
AdvancedSearchView --> AdvancedSearchForm
SearchForm --> QueryDict
AdvancedSearchForm --> QueryDict
QueryDict --> SearchBackend
SearchBackend --> ResultsView
SearchURL --> SearchView
AdvancedURL --> AdvancedSearchView
ResultsURL --> ResultsView
视图使用 SearchModelViewMixin 从 URL 参数中解析搜索模型。
来源:mayan/apps/dynamic_search/views.py:32-179, mayan/apps/dynamic_search/view_mixins.py:90-114
列表过滤
搜索系统通过 SearchEnabledListViewMixin 提供自动列表过滤:
graph LR
subgraph "列表视图集成"
ListView["通用 ListView"]
SearchMixin["SearchEnabledListViewMixin"]
SearchModel["get_search_model()"]
FilteredQuery["过滤后的查询集"]
end
subgraph "过滤处理"
QueryParam["q 参数"]
SearchBackend["SearchBackend.search()"]
PKFilter["pk__in 过滤"]
end
ListView --> SearchMixin
SearchMixin --> SearchModel
SearchModel --> QueryParam
QueryParam --> SearchBackend
SearchBackend --> PKFilter
PKFilter --> FilteredQuery
这使任何包含该混入类的列表视图都能启用搜索过滤。
来源:mayan/apps/dynamic_search/view_mixins.py:12-88
REST 接口接口
API 视图
REST API 提供程序化搜索访问:
graph TB
subgraph "API 搜索视图"
APISearchView["APISearchView<br/>简单搜索端点"]
APIAdvancedSearchView["APIAdvancedSearchView<br/>高级搜索端点"]
APISearchModelList["APISearchModelList<br/>可用搜索模型"]
end
subgraph "API 特性"
DynamicSerializer["动态序列化器<br/>search_model.serializer"]
Permissions["权限集成<br/>search_model.permission"]
Filtering["查询处理<br/>与 Web 界面相同"]
end
subgraph "URL 端点"
SearchEndpoint["/api/search/<model>/"]
AdvancedEndpoint["/api/search/advanced/<model>/"]
ModelsEndpoint["/api/search_models/"]
end
APISearchView --> DynamicSerializer
APIAdvancedSearchView --> DynamicSerializer
DynamicSerializer --> Permissions
Permissions --> Filtering
SearchEndpoint --> APISearchView
AdvancedEndpoint --> APIAdvancedSearchView
ModelsEndpoint --> APISearchModelList
API 视图根据搜索模型配置动态设置序列化器类。
来源:mayan/apps/dynamic_search/api_views.py:13-103
REST 接口过滤
RESTAPISearchFilter 在任何 API 端点上启用搜索过滤:
graph LR
subgraph "DRF 集成"
APIView["通用 API 视图"]
SearchFilter["RESTAPISearchFilter"]
QueryDict["request.GET 参数"]
BackendSearch["SearchBackend.search()"]
end
subgraph "过滤逻辑"
SearchModel["检测 SearchModel"]
ValidFields["校验字段名"]
CleanQuery["清理查询字典"]
FilteredResults["pk__in 过滤"]
end
APIView --> SearchFilter
SearchFilter --> QueryDict
QueryDict --> SearchModel
SearchModel --> ValidFields
ValidFields --> CleanQuery
CleanQuery --> BackendSearch
BackendSearch --> FilteredResults
该过滤器会自动应用于未禁用搜索过滤的 API 视图。
来源:mayan/apps/dynamic_search/filters.py:16-82
配置
设置
搜索系统通过 Django 设置进行配置:
| 设置项 | 默认值 | 用途 |
|---|---|---|
SEARCH_BACKEND | WhooshSearchBackend | 后端实现 |
SEARCH_BACKEND_ARGUMENTS | {} | 后端特定选项 |
SEARCH_RESULTS_LIMIT | 100 | 返回的最大结果数 |
SEARCH_INDEXING_CHUNK_SIZE | 25 | 索引的批量大小 |
SEARCH_DISABLE_SIMPLE_SEARCH | False | 隐藏简单搜索界面 |
后端参数用于配置特定实现(例如,Whoosh 索引路径、ElasticSearch 连接)。
来源:mayan/apps/dynamic_search/literals.py:3-48
管理命令
系统提供用于维护的管理命令:
# 索引特定对象范围
./manage.py search_index_objects <model_name> <id_range>
# 完整后端重新索引(通过任务)
# 通过 SearchBackendReindexView 触发
重新索引过程会清除现有索引,并根据当前数据库状态重建索引。
来源:mayan/apps/dynamic_search/management/commands/search_index_objects.py:9-55