9.4 · 响应生成与 Citations（Response Generation and Citations）

复杂文档理解与引用检索 · 本章是 RAGFlow DeepWiki 中文译文的独立章节页，保留原始链接、源码锚点、模块标签和章节层级。

项目RAGFlow 章节9.4 状态全文译文模块界面与交互、系统架构、检索、召回与索引、文档对象与元数据

项目要点页2.5 参考项目项目章节目录RAGFlow DeepWiki 原始章节Response Generation and Citations 上一章9.3 下一章10

源码线索

agent/canvas.py
agent/component/agent_with_tools.py
agent/component/base.py
agent/component/categorize.py
agent/component/llm.py
agent/tools/base.py
api/db/__init__.py
api/db/db_models.py
api/db/services/dialog_service.py
api/db/services/document_service.py

模块标签

界面与交互
系统架构
检索、召回与索引
文档对象与元数据
智能体运行时

中文译文

响应生成与 Citations（中文译文）

原始 DeepWiki 页面：https://deepwiki.com/infiniflow/ragflow/9.4-response-generation-and-citations

翻译时间：2026-05-27T08:44:40.089Z

翻译模型：deepseek-chat

原文字符数：10681

项目：RAGFlow (ragflow)

---

响应生成与引用

目的与范围

本文档描述了 RAGFlow 的响应生成与引用系统，该系统将检索到的文档片段格式化为提示词（Prompt），流式传输大语言模型（LLM）响应，并插入内联引用，将答案链接回源文档。系统负责上下文组装、引用标记插入、引用追踪以及空响应处理。

---

系统总览

响应生成管线协调多个组件，从检索到的知识中生成带有引用的答案。它弥合了原始文档片段与连贯、可归因响应之间的差距。

响应生成数据流

graph TB
    subgraph "输入处理"
        Query["用户查询<br/>+ 对话历史"]
        RetrievedChunks["检索到的片段<br/>来自 Dealer.search()"]
        PromptConfig["提示词配置<br/>Dialog.prompt_config"]
    end

    subgraph "上下文组装"
        ChunkFormat["chunks_format()<br/>使用 [ID] 格式化片段"]
        KBPrompt["kb_prompt()<br/>构建上下文提示词"]
        MessageFit["message_fit_in()<br/>截断至 Token 限制"]
    end

    subgraph "大语言模型生成"
        LLMBundle["LLMBundle.chat()<br/>统一模型访问"]
        Streaming["SSE 流式传输<br/>通过 Response()"]
    end

    subgraph "引用处理"
        CitationPrompt["citation_prompt()<br/>指示大语言模型进行引用"]
        UIMapping["MarkdownContent<br/>前端引用渲染"]
    end

    subgraph "参考信息组装"
        DocMetadata["DocMetadataService<br/>获取扁平化元数据"]
        DocService["DocumentService<br/>获取文档详情"]
        RefList["参考信息列表<br/>片段 + 元数据"]
    end

    subgraph "输出"
        StreamedAnswer["流式答案<br/>包含内联引用"]
        UIMessage["MessageItem<br/>React 组件"]
    end

    Query --> ChunkFormat
    RetrievedChunks --> ChunkFormat
    PromptConfig --> KBPrompt
    ChunkFormat --> KBPrompt
    KBPrompt --> MessageFit
    MessageFit --> LLMBundle

    LLMBundle --> Streaming

    RetrievedChunks --> RefList
    RefList --> DocMetadata
    DocMetadata --> DocService

    UIMapping --> UIMessage
    DocService --> UIMessage

来源：

DialogService 协调：api/db/services/dialog_service.py:98-100
提示词生成函数：api/db/services/dialog_service.py:51-51
搜索逻辑集成：rag/nlp/search.py:132-140
对话模型字段：api/db/services/dialog_service.py:168-191

---

上下文格式化与 Token 管理

系统使用 kb_prompt() 将检索到的片段组装成结构化的提示词，同时遵守大语言模型的 Token 限制。

片段格式化实现

chunks_format() 函数将检索到的片段标准化为包含内容、文档 ID 和相似度分数的格式 api/db/services/dialog_service.py:51-51。在 DialogService 中，这些片段会通过 _chunk_kb_id_for_doc 处理，以包含 kb_id 等元数据，从而支持多数据集查询 api/db/services/dialog_service.py:63-66。

Token 截断

为防止上下文溢出，message_fit_in() 会计算对话历史中每条消息的 Token 数量，并截断内容以确保总 Token 数不超过模型限制 api/db/services/dialog_service.py:51-51。LLMBundle 在调用底层模型提供商之前，也会使用 num_tokens_from_string 对输入文本进行基于 Token 的截断 api/db/services/llm_service.py:101-106。

来源：

提示词生成逻辑：api/db/services/dialog_service.py:51-51
LLMBundle Token 安全性：api/db/services/llm_service.py:99-106
对话服务引用解析：api/db/services/dialog_service.py:57-58

---

引用机制与 UI 渲染

RAGFlow 实现了一个引用系统，将后端大语言模型输出与前端文档预览连接起来。

引用生成

citation_prompt() 函数为大语言模型提供如何引用来源的指令，通常使用与格式化上下文对应的标记 api/db/services/dialog_service.py:51-51。这确保了大语言模型将特定事实归因于系统提示词中提供的检索片段。

前端引用处理

前端使用专门的组件来渲染带有引用的响应。MarkdownContent 和 NextMarkdownContent 处理包含引用标记的文本片段 web/src/components/markdown-content/index.tsx web/src/components/next-markdown-content/index.tsx。

组件	角色	代码实体
后端提示词	指示大语言模型使用引用	`citation_prompt` `api/db/services/dialog_service.py:51-51`
大语言模型封装	统一聊天接口	`LLMBundle` `api/db/services/llm_service.py:85-87`
搜索处理器	提供片段元数据	`Dealer.SearchResult` `rag/nlp/search.py:42-51`
Markdown 渲染	渲染带引用的文本	`NextMarkdownContent` `web/src/components/next-markdown-content/index.tsx`

来源：

引用提示词定义：api/db/services/dialog_service.py:51-51
大语言模型执行：api/db/services/llm_service.py:171-180
Markdown 组件：web/src/components/next-markdown-content/index.tsx

---

元数据丰富

引用由通过 DocMetadataService 检索的详细文档元数据支持。

元数据流

检索：使用 Dealer 类检索片段，字段包括 doc_id、kb_id 和 img_id rag/nlp/search.py:149-153。
元数据获取：DocMetadataService.get_metadata_for_documents 用于获取当前页面中出现的文档的元数据 api/db/services/document_service.py:107-107。
丰富：enrich_chunks_with_document_metadata 函数将这些元数据字段映射回片段，为引用提供上下文 api/db/services/dialog_service.py:60-61。

来源：

搜索服务字段：rag/nlp/search.py:149-153
文档元数据映射：api/db/services/document_service.py:107-109
参考信息丰富：api/db/services/dialog_service.py:60-61

---

空响应与错误处理

RAGFlow 管理检索失败或用户未授权的情况。

空响应处理

当标准检索上下文不足时，ASK_SUMMARY 提示词被用作摘要任务的回退 api/db/services/dialog_service.py:51-51。如果未找到任何知识，系统通常依赖提示词配置来定义回退行为。

任务与文档状态

在生成之前，系统会使用 KnowledgebaseService.is_parsed_done() 检查知识库中的文档是否已完全解析 api/db/services/knowledgebase_service.py:106-136。如果任务失败或在 3 次尝试后被放弃，则会在 TaskService 中标记为放弃状态 api/db/services/task_service.py:130-141。

场景	处理机制	代码位置
解析未完成	`is_parsed_done` 检查	`api/db/services/knowledgebase_service.py:106-136`
任务被放弃	3 次重试后进度设置为 -1	`api/db/services/task_service.py:130-141`
过期片段	`_prune_deleted_chunks` 在 Dealer 中	`rag/nlp/search.py:76-118`
模型用量限制	`increase_usage_by_id` 失败	`api/db/services/llm_service.py:111-112`

来源：

提示词回退：api/db/services/dialog_service.py:51-51
任务重试逻辑：api/db/services/task_service.py:130-141
文档健康检查：api/db/services/knowledgebase_service.py:106-136
过期数据清理：rag/nlp/search.py:76-118

---

流式传输实现

RAGFlow 支持使用服务器推送事件（SSE）进行实时响应流式传输。

流式传输数据流

LLMBundle.chat 方法与模型提供商交互以接收 Token 流。后端将这些片段传递给客户端。

sequenceDiagram
    participant Frontend as "聊天 UI"
    participant API as "聊天 API 处理器"
    participant LLM as "LLMBundle (api/db/services/llm_service.py)"
    participant Provider as "模型提供商"

    Frontend->>API: POST /chat/completions (SSE)
    API->>LLM: chat(stream=True)
    LLM->>Provider: 请求流
    loop 对于每个片段
        Provider-->>LLM: token
        LLM-->>API: yield chunk
        API-->>Frontend: "data: {answer: '...'}"
    end

来源：

大语言模型聊天执行：api/db/services/llm_service.py:171-180
Token 用量更新：api/db/services/llm_service.py:111-112