agentic_huge_data_base / wiki
页面 Mayan EDMS · 5.1 检索系统·DeepWiki 中文全文译文

5.1 · 检索系统(Search System)

企业电子文档治理 · 本章是 Mayan EDMS DeepWiki 中文译文的独立章节页,保留原始链接、源码锚点、模块标签和章节层级。

项目Mayan EDMS 章节5.1 状态全文译文 模块文档对象与元数据、系统架构、测试、发布与运维、检索、召回与索引
源码线索
  • docs/releases/4.1.9.txt
  • mayan/apps/cabinets/tests/test_widgets.py
  • mayan/apps/databases/literals.py
  • mayan/apps/dynamic_search/api_views.py
  • mayan/apps/dynamic_search/backends/django.py
  • mayan/apps/dynamic_search/backends/whoosh.py
  • mayan/apps/dynamic_search/classes.py
  • mayan/apps/dynamic_search/filters.py
  • mayan/apps/dynamic_search/literals.py
  • mayan/apps/dynamic_search/management/commands/search_index_objects.py
模块标签
  • 文档对象与元数据
  • 系统架构
  • 测试、发布与运维
  • 检索、召回与索引
  • 界面与交互

中文译文

检索系统(中文译文)

原始 DeepWiki 页面:https://deepwiki.com/mayan-edms/Mayan-EDMS/5.1-search-system
翻译时间:2026-05-27T08:44:38.105Z
翻译模型:deepseek-chat
原文字符数:19819
项目:Mayan EDMS (mayan-edms)

---

搜索系统

相关源文件

以下文件被用作生成此 Wiki 页面的上下文:

  • docs/releases/4.1.9.txt
  • mayan/apps/cabinets/tests/test_widgets.py
  • mayan/apps/databases/literals.py
  • mayan/apps/dynamic_search/api_views.py
  • mayan/apps/dynamic_search/backends/django.py
  • mayan/apps/dynamic_search/backends/whoosh.py
  • mayan/apps/dynamic_search/classes.py
  • mayan/apps/dynamic_search/filters.py
  • mayan/apps/dynamic_search/literals.py
  • mayan/apps/dynamic_search/management/commands/search_index_objects.py
  • mayan/apps/dynamic_search/queues.py
  • mayan/apps/dynamic_search/serializers.py
  • mayan/apps/dynamic_search/tasks.py
  • mayan/apps/dynamic_search/templates/dynamic_search/app/list_toolbar.html
  • mayan/apps/dynamic_search/templates/dynamic_search/search_box.html
  • mayan/apps/dynamic_search/tests/backends.py
  • mayan/apps/dynamic_search/tests/mixins.py
  • mayan/apps/dynamic_search/tests/test_backends.py
  • mayan/apps/dynamic_search/tests/test_classes.py
  • mayan/apps/dynamic_search/tests/test_tasks.py
  • mayan/apps/dynamic_search/tests/test_views.py
  • mayan/apps/dynamic_search/urls.py
  • mayan/apps/dynamic_search/utils.py
  • mayan/apps/dynamic_search/view_mixins.py
  • mayan/apps/dynamic_search/views.py
  • mayan/apps/metadata/tests/test_widgets.py
  • mayan/apps/tags/tests/test_widgets.py

目的与范围

搜索系统为 Mayan EDMS 提供动态搜索能力,使用户能够通过多种搜索后端查找文档、元数据、标签及其他内容。该系统支持简单搜索和高级搜索界面、自动内容索引以及可插拔的后端实现,包括 Whoosh、ElasticSearch 和 Django ORM。

关于文档组织功能,请参见文档组织。关于访问控制集成,请参见访问控制与权限

架构总览

搜索系统采用可插拔的后端架构,具备自动索引和多种界面选项:

graph TB
    subgraph "搜索界面"
        WebUI["SearchView<br/>AdvancedSearchView"]
        RestAPI["APISearchView<br/>APIAdvancedSearchView"]
        Filter["RESTAPISearchFilter"]
    end

    subgraph "核心搜索框架"
        Backend["SearchBackend"]
        Model["SearchModel"]
        Field["SearchField"]
    end

    subgraph "后端实现"
        Whoosh["WhooshSearchBackend"]
        Django["DjangoSearchBackend"]
        Elastic["ElasticSearchBackend"]
    end

    subgraph "索引系统"
        Signals["Django 信号"]
        Tasks["Celery 任务<br/>task_index_instance<br/>task_deindex_instance"]
        Handlers["信号处理器"]
    end

    subgraph "数据源"
        Documents["文档模型"]
        Metadata["元数据模型"]
        Tags["标签模型"]
        Content["OCR 内容"]
    end

    WebUI --> Backend
    RestAPI --> Backend
    Filter --> Backend

    Backend --> Whoosh
    Backend --> Django
    Backend --> Elastic

    Model --> Field
    Backend --> Model

    Documents --> Signals
    Metadata --> Signals
    Tags --> Signals
    Content --> Signals

    Signals --> Handlers
    Handlers --> Tasks
    Tasks --> Backend

来源:mayan/apps/dynamic_search/classes.py:38-508, mayan/apps/dynamic_search/views.py:32-179, mayan/apps/dynamic_search/api_views.py:13-103

核心组件

SearchBackend 类

SearchBackend 类为所有搜索实现提供抽象接口:

classDiagram
    class SearchBackend {
        +get_instance() SearchBackend
        +search(query, search_model, user) QuerySet
        +index_instance(instance) void
        +deindex_instance(instance) void
        +reset() void
        +_search(query, search_model, user) QuerySet*
        +cleanup_query(query, search_model) dict
        +decode_query(query) dict
    }

    class WhooshSearchBackend {
        +index_path Path
        +writer_kwargs dict
        +_search(query, search_model, user) QuerySet
        +get_or_create_index(search_model) Index
    }

    class DjangoSearchBackend {
        +_search(query, search_model, user) QuerySet
        +get_search_query(query, search_model) SearchQuery
    }

    SearchBackend <|-- WhooshSearchBackend
    SearchBackend <|-- DjangoSearchBackend

后端系统通过 setting_backend 配置支持可插拔实现,默认使用 Whoosh。

来源:mayan/apps/dynamic_search/classes.py:38-508, mayan/apps/dynamic_search/backends/whoosh.py:28-292, mayan/apps/dynamic_search/backends/django.py:15-243

SearchModel 注册

SearchModel 实例定义哪些 Django 模型可搜索,并配置其搜索字段:

graph LR
    subgraph "SearchModel 注册表"
        Registry["SearchModel._registry<br/>dict"]
        All["SearchModel.all()<br/>返回所有模型"]
        Get["SearchModel.get(name)<br/>按名称检索"]
    end

    subgraph "字段配置"
        DirectFields["fields_direct<br/>模型自身字段"]
        RelatedFields["fields_related<br/>外键字段"]
        SearchFields["search_fields_dict<br/>字段定义"]
    end

    subgraph "模型关系"
        ProxyModels["代理模型<br/>_proxies 列表"]
        RelatedModels["get_related_models()<br/>反向关系"]
        ThroughModels["get_through_models()<br/>多对多关系"]
    end

    Registry --> DirectFields
    Registry --> RelatedFields
    DirectFields --> SearchFields
    RelatedFields --> SearchFields
    SearchFields --> ProxyModels
    SearchFields --> RelatedModels
    RelatedModels --> ThroughModels

搜索模型通过 AppsModuleLoaderMixin 从每个应用的 search.py 模块自动加载。

来源:mayan/apps/dynamic_search/classes.py:564-862

SearchField 类型

搜索字段支持直接模型字段和通过外键关联的字段:

字段类型语法示例
直接字段field_namelabel, description
外键字段fk__fielddocument_type__label
反向外键related__fieldfiles__filename
多对多m2m__fieldtags__label
深层关系fk__fk__fielddocument__document_type__label

来源:mayan/apps/dynamic_search/classes.py:510-563

搜索后端实现

Whoosh 后端

Whoosh 后端提供基于文件的全文搜索索引:

graph TB
    subgraph "Whoosh 后端组件"
        WhooshBackend["WhooshSearchBackend"]
        IndexPath["index_path<br/>文件存储位置"]
        Schema["get_search_model_schema()<br/>字段映射"]
        Storage["FileStorage<br/>whoosh 存储"]
    end

    subgraph "索引操作"
        CreateIndex["get_or_create_index()"]
        IndexInstance["index_instance()"]
        DeindexInstance["deindex_instance()"]
        IndexInstances["index_instances()"]
    end

    subgraph "搜索操作"
        QueryParser["qparser.QueryParser"]
        SearchString["搜索字符串构建"]
        Results["Whoosh 结果"]
        QuerySet["Django QuerySet"]
    end

    WhooshBackend --> IndexPath
    WhooshBackend --> Schema
    IndexPath --> Storage
    Storage --> CreateIndex

    CreateIndex --> IndexInstance
    CreateIndex --> DeindexInstance
    CreateIndex --> IndexInstances

    WhooshBackend --> QueryParser
    QueryParser --> SearchString
    SearchString --> Results
    Results --> QuerySet

Whoosh 使用基于文件的索引,默认存储在 MEDIA_ROOT/whoosh_index/ 中,并通过锁定机制支持并发访问。

来源:mayan/apps/dynamic_search/backends/whoosh.py:28-292

Django 后端

Django 后端使用原生 ORM 查询进行搜索,无需外部依赖:

graph TB
    subgraph "Django 搜索组件"
        DjangoBackend["DjangoSearchBackend"]
        SearchQuery["SearchQuery<br/>构建 Django Q 对象"]
        FieldQuery["FieldQuery<br/>逐字段处理"]
        TermCollection["SearchTermCollection<br/>解析搜索词"]
    end

    subgraph "查询构建"
        Terms["SearchTerm<br/>单个词项"]
        QObjects["Q() 对象<br/>Django ORM"]
        Operators["AND/OR 逻辑"]
        Negation["NOT 逻辑"]
    end

    subgraph "词项处理"
        Quotes["引号字符串"]
        Spaces["空格分隔"]
        Meta["元词项 (OR)"]
        Negated["否定词项 (-)"]
    end

    DjangoBackend --> SearchQuery
    SearchQuery --> FieldQuery
    FieldQuery --> TermCollection
    TermCollection --> Terms

    Terms --> QObjects
    QObjects --> Operators
    QObjects --> Negation

    Terms --> Quotes
    Terms --> Spaces
    Terms --> Meta
    Terms --> Negated

Django 后端支持高级查询语法,包括引号字符串、否定词项和布尔运算符。

来源:mayan/apps/dynamic_search/backends/django.py:15-243

索引系统

自动索引

搜索系统通过 Django 信号自动维护索引:

sequenceDiagram
    participant Model as "Django 模型"
    participant Signal as "Django 信号"
    participant Handler as "信号处理器"
    participant Task as "Celery 任务"
    participant Backend as "搜索后端"

    Model->>Signal: post_save/pre_delete
    Signal->>Handler: handler_index_instance
    Handler->>Task: task_index_instance.apply_async()
    Task->>Backend: backend.index_instance()
    Backend->>Backend: 更新搜索索引

    Note over Model,Backend: 模型变更时自动索引

    Model->>Signal: m2m_changed (多对多字段)
    Signal->>Handler: handler_factory_index_related_instance_m2m
    Handler->>Task: task_index_related_instance_m2m.apply_async()
    Task->>Backend: backend.index_related_instance_m2m()

信号处理器通过 SearchBackend._enable()SearchBackend._disable() 自动连接/断开。

来源:mayan/apps/dynamic_search/classes.py:85-146, mayan/apps/dynamic_search/tasks.py:22-166

索引任务

搜索索引使用 Celery 任务进行异步处理:

任务用途队列
task_index_instance索引单个模型实例queue_search
task_deindex_instance从索引中移除实例queue_search
task_index_instances批量索引多个实例queue_search
task_reindex_backend全部内容重新索引queue_search_slow
task_index_related_instance_m2m处理多对多关系变更queue_search

任务包含指数退避重试逻辑,用于处理临时故障。

来源:mayan/apps/dynamic_search/tasks.py:22-166, mayan/apps/dynamic_search/queues.py:6-42

查询处理

查询语法

搜索系统支持带有作用域搜索的高级查询语法:

graph TB
    subgraph "查询结构"
        RawQuery["原始查询字典"]
        DecodedQuery["解码查询<br/>作用域 + 运算符"]
        Scopes["作用域<br/>隔离的搜索上下文"]
        Operators["运算符<br/>作用域间的 AND/OR/NOT"]
    end

    subgraph "作用域组件"
        ScopeQuery["作用域查询<br/>字段: 值 对"]
        MatchAll["match_all<br/>作用域内 AND 与 OR"]
        ScopeId["scope_id<br/>0, 1, a, b 等"]
    end

    subgraph "查询示例"
        Simple["q=搜索词"]
        Advanced["label=document AND type=pdf"]
        Scoped["__0_label=doc __1_type=pdf __operator_0_1=OR_2"]
    end

    RawQuery --> DecodedQuery
    DecodedQuery --> Scopes
    DecodedQuery --> Operators

    Scopes --> ScopeQuery
    Scopes --> MatchAll
    Scopes --> ScopeId

    Simple --> RawQuery
    Advanced --> RawQuery
    Scoped --> RawQuery

查询语法支持作用域标记(__)、运算符(__operator_)和结果选择(__result)。

来源:mayan/apps/dynamic_search/classes.py:261-331

权限集成

搜索结果通过 ACL 权限进行过滤:

graph LR
    subgraph "搜索流程"
        Query["搜索查询"]
        Backend["后端搜索"]
        Results["原始结果"]
        ACL["ACL.restrict_queryset()"]
        FilteredResults["过滤后的结果"]
        Limit["结果限制"]
    end

    Query --> Backend
    Backend --> Results
    Results --> ACL
    ACL --> FilteredResults
    FilteredResults --> Limit

    subgraph "权限检查"
        SearchModel["SearchModel.permission"]
        UserPerms["用户权限"]
        ObjectPerms["对象级 ACL"]
    end

    SearchModel --> ACL
    UserPerms --> ACL
    ObjectPerms --> ACL

每个 SearchModel 可以指定所需的权限,结果会通过 ACL 系统进行过滤。

来源:mayan/apps/dynamic_search/classes.py:438-444

Web 界面

搜索视图

Web 界面提供简单搜索和高级搜索两种表单:

graph TB
    subgraph "搜索界面视图"
        SearchView["SearchView<br/>简单搜索表单"]
        AdvancedSearchView["AdvancedSearchView<br/>字段特定搜索"]
        ResultsView["ResultsView<br/>分页结果"]
        SearchAgainView["SearchAgainView<br/>重定向辅助"]
    end

    subgraph "表单处理"
        SearchForm["SearchForm<br/>q 参数"]
        AdvancedSearchForm["AdvancedSearchForm<br/>逐字段输入"]
        QueryDict["request.GET.dict()"]
        SearchBackend["SearchBackend.search()"]
    end

    subgraph "URL 模式"
        SearchURL["search/<search_model_pk>/"]
        AdvancedURL["advanced/<search_model_pk>/"]
        ResultsURL["results/<search_model_pk>/"]
    end

    SearchView --> SearchForm
    AdvancedSearchView --> AdvancedSearchForm
    SearchForm --> QueryDict
    AdvancedSearchForm --> QueryDict
    QueryDict --> SearchBackend
    SearchBackend --> ResultsView

    SearchURL --> SearchView
    AdvancedURL --> AdvancedSearchView
    ResultsURL --> ResultsView

视图使用 SearchModelViewMixin 从 URL 参数中解析搜索模型。

来源:mayan/apps/dynamic_search/views.py:32-179, mayan/apps/dynamic_search/view_mixins.py:90-114

列表过滤

搜索系统通过 SearchEnabledListViewMixin 提供自动列表过滤:

graph LR
    subgraph "列表视图集成"
        ListView["通用 ListView"]
        SearchMixin["SearchEnabledListViewMixin"]
        SearchModel["get_search_model()"]
        FilteredQuery["过滤后的查询集"]
    end

    subgraph "过滤处理"
        QueryParam["q 参数"]
        SearchBackend["SearchBackend.search()"]
        PKFilter["pk__in 过滤"]
    end

    ListView --> SearchMixin
    SearchMixin --> SearchModel
    SearchModel --> QueryParam
    QueryParam --> SearchBackend
    SearchBackend --> PKFilter
    PKFilter --> FilteredQuery

这使任何包含该混入类的列表视图都能启用搜索过滤。

来源:mayan/apps/dynamic_search/view_mixins.py:12-88

REST 接口接口

API 视图

REST API 提供程序化搜索访问:

graph TB
    subgraph "API 搜索视图"
        APISearchView["APISearchView<br/>简单搜索端点"]
        APIAdvancedSearchView["APIAdvancedSearchView<br/>高级搜索端点"]
        APISearchModelList["APISearchModelList<br/>可用搜索模型"]
    end

    subgraph "API 特性"
        DynamicSerializer["动态序列化器<br/>search_model.serializer"]
        Permissions["权限集成<br/>search_model.permission"]
        Filtering["查询处理<br/>与 Web 界面相同"]
    end

    subgraph "URL 端点"
        SearchEndpoint["/api/search/<model>/"]
        AdvancedEndpoint["/api/search/advanced/<model>/"]
        ModelsEndpoint["/api/search_models/"]
    end

    APISearchView --> DynamicSerializer
    APIAdvancedSearchView --> DynamicSerializer
    DynamicSerializer --> Permissions
    Permissions --> Filtering

    SearchEndpoint --> APISearchView
    AdvancedEndpoint --> APIAdvancedSearchView
    ModelsEndpoint --> APISearchModelList

API 视图根据搜索模型配置动态设置序列化器类。

来源:mayan/apps/dynamic_search/api_views.py:13-103

REST 接口过滤

RESTAPISearchFilter 在任何 API 端点上启用搜索过滤:

graph LR
    subgraph "DRF 集成"
        APIView["通用 API 视图"]
        SearchFilter["RESTAPISearchFilter"]
        QueryDict["request.GET 参数"]
        BackendSearch["SearchBackend.search()"]
    end

    subgraph "过滤逻辑"
        SearchModel["检测 SearchModel"]
        ValidFields["校验字段名"]
        CleanQuery["清理查询字典"]
        FilteredResults["pk__in 过滤"]
    end

    APIView --> SearchFilter
    SearchFilter --> QueryDict
    QueryDict --> SearchModel
    SearchModel --> ValidFields
    ValidFields --> CleanQuery
    CleanQuery --> BackendSearch
    BackendSearch --> FilteredResults

该过滤器会自动应用于未禁用搜索过滤的 API 视图。

来源:mayan/apps/dynamic_search/filters.py:16-82

配置

设置

搜索系统通过 Django 设置进行配置:

设置项默认值用途
SEARCH_BACKENDWhooshSearchBackend后端实现
SEARCH_BACKEND_ARGUMENTS{}后端特定选项
SEARCH_RESULTS_LIMIT100返回的最大结果数
SEARCH_INDEXING_CHUNK_SIZE25索引的批量大小
SEARCH_DISABLE_SIMPLE_SEARCHFalse隐藏简单搜索界面

后端参数用于配置特定实现(例如,Whoosh 索引路径、ElasticSearch 连接)。

来源:mayan/apps/dynamic_search/literals.py:3-48

管理命令

系统提供用于维护的管理命令:

# 索引特定对象范围
./manage.py search_index_objects <model_name> <id_range>

# 完整后端重新索引(通过任务)
# 通过 SearchBackendReindexView 触发

重新索引过程会清除现有索引,并根据当前数据库状态重建索引。

来源:mayan/apps/dynamic_search/management/commands/search_index_objects.py:9-55