agentic_huge_data_base / wiki
页面 Argilla · 7 接口参考·DeepWiki 中文全文译文

7 · 接口参考(API Reference)

人工复核与反馈数据 · 本章是 Argilla DeepWiki 中文译文的独立章节页,保留原始链接、源码锚点、模块标签和章节层级。

项目Argilla 章节7 状态全文译文 模块接口与服务契约、界面与交互、评测、反馈与人工复核、文档对象与元数据
源码线索
  • argilla-frontend/CHANGELOG.md
  • argilla-frontend/components/features/annotation/container/questions/form/span/EntityLabelSelection.component.vue
  • argilla-frontend/components/features/annotation/settings/Validation.vue
  • argilla-frontend/components/features/dataset-creation/configuration/DatasetConfigurationForm.vue
  • argilla-frontend/components/features/dataset-creation/configuration/questions/DatasetConfigurationFieldSelector.vue
  • argilla-frontend/components/features/dataset-creation/configuration/questions/DatasetConfigurationLabels.vue
  • argilla-frontend/components/features/dataset-creation/configuration/questions/DatasetConfigurationQuestion.vue
  • argilla-frontend/components/features/dataset-creation/configuration/questions/DatasetConfigurationRating.vue
  • argilla-frontend/components/features/dataset-creation/configuration/questions/DatasetConfigurationSpan.vue
  • argilla-frontend/package.json
模块标签
  • 接口与服务契约
  • 界面与交互
  • 评测、反馈与人工复核
  • 文档对象与元数据
  • 检索、召回与索引

中文译文

接口参考(中文译文)

原始 DeepWiki 页面:https://deepwiki.com/argilla-io/argilla/7-api-reference
翻译时间:2026-05-27T08:44:48.808Z
翻译模型:deepseek-chat
原文字符数:25442
项目:Argilla (argilla)

---

API 参考文档

相关源文件

以下文件为本 Wiki 页面生成时使用的上下文:

  • argilla-frontend/CHANGELOG.md
  • argilla-frontend/components/features/annotation/container/questions/form/span/EntityLabelSelection.component.vue
  • argilla-frontend/components/features/annotation/settings/Validation.vue
  • argilla-frontend/components/features/dataset-creation/configuration/DatasetConfigurationForm.vue
  • argilla-frontend/components/features/dataset-creation/configuration/questions/DatasetConfigurationFieldSelector.vue
  • argilla-frontend/components/features/dataset-creation/configuration/questions/DatasetConfigurationLabels.vue
  • argilla-frontend/components/features/dataset-creation/configuration/questions/DatasetConfigurationQuestion.vue
  • argilla-frontend/components/features/dataset-creation/configuration/questions/DatasetConfigurationRating.vue
  • argilla-frontend/components/features/dataset-creation/configuration/questions/DatasetConfigurationSpan.vue
  • argilla-frontend/package.json
  • argilla-frontend/translation/de.js
  • argilla-frontend/translation/en.js
  • argilla-frontend/translation/es.js
  • argilla-frontend/v1/domain/entities/hub/DatasetCreation.test.ts
  • argilla-frontend/v1/domain/entities/hub/QuestionCreation.ts
  • argilla-frontend/v1/domain/entities/hub/Subset.ts
  • argilla-server/CHANGELOG.md
  • argilla-server/src/argilla_server/_version.py
  • argilla-server/src/argilla_server/alembic/versions/580a6553186f_add_datasets_users_table.py
  • argilla-server/src/argilla_server/api/handlers/v1/datasets/datasets.py
  • argilla-server/src/argilla_server/api/schemas/v1/datasets.py
  • argilla-server/src/argilla_server/bulk/records_bulk.py
  • argilla-server/src/argilla_server/contexts/datasets.py
  • argilla-server/src/argilla_server/database.py
  • argilla-server/src/argilla_server/models/database.py
  • argilla-server/tests/factories.py
  • argilla-server/tests/unit/api/handlers/v1/datasets/records/records_bulk/test_create_dataset_records_bulk.py
  • argilla-server/tests/unit/api/handlers/v1/datasets/records/records_bulk/test_dataset_records_bulk_with_responses.py
  • argilla-server/tests/unit/api/handlers/v1/datasets/test_get_dataset_progress.py
  • argilla-server/tests/unit/api/handlers/v1/responses/test_create_current_user_responses_bulk.py
  • argilla-server/tests/unit/api/handlers/v1/test_datasets.py
  • argilla-server/tests/unit/api/handlers/v1/test_records.py
  • argilla-server/tests/unit/database/models/test_dataset_user_model.py
  • argilla-server/tests/unit/test_database.py
  • argilla-v1/src/argilla_v1/_version.py
  • argilla/CHANGELOG.md
  • argilla/src/argilla/__init__.py
  • argilla/src/argilla/_version.py
  • docs/_source/.readthedocs.yaml
  • docs/_source/_static/images/og-doc.png
  • docs/_source/_templates/page.html
  • docs/_source/conf.py
  • docs/_source/getting_started/quickstart.md
  • docs/_source/reference/python/python_client.rst
  • docs/_source/reference/python/python_training.rst
  • docs/_source/requirements.txt

本参考文档全面记录了 Argilla 的应用程序编程接口(API),涵盖 Python SDK 和 REST API 端点。Python SDK 提供了高层级的、符合 Python 习惯的接口用于与 Argilla 交互,而 REST API 则为自定义集成提供了更直接的访问方式。

关于部署 Argilla 的信息,请参见部署与配置。关于用户指南,请参见用户指南

Python SDK API

Python SDK 是以编程方式与 Argilla 交互的推荐方式。它将 REST API 调用抽象为便捷的接口。

客户端初始化
import argilla as rg

# 方法一:函数式方式——设置全局客户端
rg.init(api_url="http://localhost:6900", api_key="your-api-key")

# 方法二:面向对象方式
client = rg.Argilla(api_url="http://localhost:6900", api_key="your-api-key")

# 方法三:上下文管理器
with rg.Argilla(api_url="http://localhost:6900", api_key="your-api-key") as client:
    # 使用客户端...

来源:argilla/src/argilla/__init__.py, argilla-server/src/argilla_server/database.py

数据集管理
# 创建数据集
dataset = rg.Dataset(
    name="sentiment-analysis",
    workspace="default",
    fields=[
        rg.TextField(name="text", title="文本内容")
    ],
    questions=[
        rg.LabelQuestion(
            name="sentiment",
            title="情感倾向",
            options=["positive", "negative", "neutral"]
        )
    ]
)

# 推送到 Argilla 服务器
remote_dataset = dataset.push_to_argilla()

# 加载已有数据集
dataset = rg.load(name="sentiment-analysis", workspace="default")

# 列出所有数据集
datasets = rg.list_datasets()

# 删除数据集
rg.delete(name="sentiment-analysis")

来源:argilla/src/argilla/__init__.py, argilla-server/src/argilla_server/contexts/datasets.py

字段类型

Argilla 支持多种字段类型来展示不同类型的数据:

# 文本字段
text_field = rg.TextField(name="text", title="文本字段")

# 图片字段
image_field = rg.ImageField(name="image", title="图片字段")

# 对话字段(用于对话场景)
chat_field = rg.ChatField(name="chat", title="对话字段")

# 自定义字段
custom_field = rg.CustomField(name="custom", title="自定义字段")

来源:argilla-frontend/translation/en.js:2-9, argilla-frontend/v1/domain/entities/hub/FieldCreation.ts

问题类型

问题定义了标注人员需要提供的内容:

# 标签问题(单选)
label_question = rg.LabelQuestion(
    name="category",
    title="分类",
    options=["news", "sports", "entertainment"]
)

# 多标签问题
multi_label_question = rg.MultiLabelQuestion(
    name="topics",
    title="主题",
    options=["politics", "economy", "technology"]
)

# 评分问题
rating_question = rg.RatingQuestion(
    name="quality",
    title="质量评分",
    options=[0, 1, 2, 3, 4, 5]
)

# 排序问题
ranking_question = rg.RankingQuestion(
    name="preference",
    title="偏好排序",
    options=["option_a", "option_b", "option_c"]
)

# 跨度问题(用于实体标注)
span_question = rg.SpanQuestion(
    name="entities",
    title="实体标注",
    field="text",
    options=["person", "organization", "location"]
)

# 文本问题(自由文本)
text_question = rg.TextQuestion(
    name="comment",
    title="评论"
)

来源:argilla-frontend/translation/en.js:2-9, argilla-frontend/v1/domain/entities/hub/QuestionCreation.ts:19-26

记录操作

记录是用户标注的单个数据点:

# 创建记录
record = rg.FeedbackRecord(
    fields={"text": "这是一个示例文本。"},
    metadata={"source": "news", "length": 21}
)

# 向数据集添加记录
dataset.add_records([record])

# 添加多条记录
records = [
    rg.FeedbackRecord(fields={"text": "示例 1"}),
    rg.FeedbackRecord(fields={"text": "示例 2"})
]
dataset.add_records(records)

# 搜索记录
results = dataset.records.search("示例")

# 按元数据过滤记录
filtered = dataset.records.filter_by(
    metadata_filters=[
        rg.TermsMetadataFilter(name="source", value="news")
    ]
)

# 限制返回的记录数量
limited = dataset.records.pull(max_records=100)

来源:argilla-server/src/argilla_server/contexts/datasets.py:330-339, argilla-server/src/argilla_server/bulk/records_bulk.py:46-89

响应与建议

响应是用户提交的标注,而建议是预填充的标注(通常来自模型):

# 为记录添加建议
record.suggest(
    question_name="sentiment",
    value="positive",
    score=0.95,
    agent="gpt-4"
)

# 提交响应
record.respond(
    question_name="sentiment",
    value="negative"
)

# 获取记录的响应
responses = record.responses

# 按状态过滤响应
submitted = dataset.responses.filter_by(status="submitted")

来源:argilla-server/src/argilla_server/contexts/datasets.py:480-540

向量操作

向量嵌入支持相似度搜索:

# 为数据集添加向量设置
dataset.add_vector_settings(
    name="embeddings",
    dimensions=768
)

# 为记录添加向量
record = rg.FeedbackRecord(
    fields={"text": "示例"},
    vectors={"embeddings": [0.1, 0.2, ...]}  # 维度为 768 的向量
)

# 查找相似记录
similar = dataset.find_similar_records(
    record_id="record-id",
    vector_name="embeddings",
    limit=10
)

来源:argilla-server/src/argilla_server/contexts/datasets.py:301-326, argilla-server/src/argilla_server/bulk/records_bulk.py:136-155

元数据属性

元数据属性存储可用于过滤的记录附加信息:

# 定义元数据属性
term_metadata = rg.TermsMetadataProperty(
    name="source",
    title="来源",
    visible_for_annotators=True
)

int_metadata = rg.IntegerMetadataProperty(
    name="length",
    title="文本长度",
    visible_for_annotators=False
)

float_metadata = rg.FloatMetadataProperty(
    name="score",
    title="置信度分数",
    visible_for_annotators=True
)

# 向数据集添加元数据属性
dataset.add_metadata_property(term_metadata)

来源:argilla-server/src/argilla_server/contexts/datasets.py:246-270, argilla-server/src/argilla_server/contexts/datasets.py:273-282

Hugging Face 集成

Argilla 与 Hugging Face Hub 集成,支持数据集的导入/导出:

# 从 Hugging Face Hub 导入
dataset = rg.Dataset.from_hub("stanfordnlp/imdb")

# 导出到 Hugging Face Hub
dataset.push_to_hub(
    repo_id="username/dataset-name",
    private=True,
    token="your-huggingface-token"
)

来源:argilla-frontend/translation/en.js:332-345, argilla-server/src/argilla_server/api/handlers/v1/datasets/datasets.py:444-467

REST 接口端点

REST API 提供了对 Argilla 功能的直接编程访问。

认证端点
POST /api/v1/token                # 获取认证令牌

来源:argilla-server/src/argilla_server/database.py:83-97

数据集端点
GET    /api/v1/me/datasets                     # 列出当前用户的数据集
POST   /api/v1/datasets                        # 创建数据集
GET    /api/v1/datasets/{dataset_id}           # 获取指定数据集
PATCH  /api/v1/datasets/{dataset_id}           # 更新数据集
DELETE /api/v1/datasets/{dataset_id}           # 删除数据集
POST   /api/v1/datasets/{dataset_id}/publish   # 发布数据集
GET    /api/v1/datasets/{dataset_id}/progress  # 获取数据集进度

来源:argilla-server/src/argilla_server/api/handlers/v1/datasets/datasets.py:75-96, argilla-server/src/argilla_server/api/handlers/v1/datasets/datasets.py:136-179

字段端点
GET    /api/v1/datasets/{dataset_id}/fields      # 列出数据集字段
POST   /api/v1/datasets/{dataset_id}/fields      # 创建字段
PATCH  /api/v1/fields/{field_id}                 # 更新字段
DELETE /api/v1/fields/{field_id}                 # 删除字段

来源:argilla-server/src/argilla_server/api/handlers/v1/datasets/datasets.py:99-107

问题端点
GET    /api/v1/datasets/{dataset_id}/questions      # 列出数据集问题
POST   /api/v1/datasets/{dataset_id}/questions      # 创建问题
PATCH  /api/v1/questions/{question_id}              # 更新问题
DELETE /api/v1/questions/{question_id}              # 删除问题
记录端点
POST   /api/v1/datasets/{dataset_id}/records/bulk    # 批量创建记录
PUT    /api/v1/datasets/{dataset_id}/records/bulk    # 批量更新记录
GET    /api/v1/datasets/{dataset_id}/records         # 列出数据集中的记录
GET    /api/v1/records/{record_id}                   # 获取指定记录
PATCH  /api/v1/records/{record_id}                   # 更新记录
DELETE /api/v1/records/{record_id}                   # 删除记录
POST   /api/v1/datasets/{dataset_id}/records/search  # 搜索记录

来源:argilla-server/src/argilla_server/bulk/records_bulk.py:46-89, argilla-server/src/argilla_server/contexts/datasets.py:330-339

响应端点
POST   /api/v1/me/responses/bulk        # 批量提交响应
GET    /api/v1/responses/{response_id}  # 获取指定响应
PATCH  /api/v1/responses/{response_id}  # 更新响应
DELETE /api/v1/responses/{response_id}  # 删除响应

来源:argilla-server/src/argilla_server/contexts/datasets.py:480-540

元数据属性端点
GET    /api/v1/datasets/{dataset_id}/metadata-properties       # 列出元数据属性
POST   /api/v1/datasets/{dataset_id}/metadata-properties       # 创建元数据属性
PATCH  /api/v1/metadata-properties/{metadata_property_id}      # 更新元数据属性
DELETE /api/v1/metadata-properties/{metadata_property_id}      # 删除元数据属性
GET    /api/v1/metadata-properties/{metadata_property_id}/metrics  # 获取属性指标

来源:argilla-server/src/argilla_server/contexts/datasets.py:246-270, argilla-server/src/argilla_server/contexts/datasets.py:273-282

向量设置端点
GET    /api/v1/datasets/{dataset_id}/vectors-settings       # 列出向量设置
POST   /api/v1/datasets/{dataset_id}/vectors-settings       # 创建向量设置
PATCH  /api/v1/vectors-settings/{vector_settings_id}        # 更新向量设置
DELETE /api/v1/vectors-settings/{vector_settings_id}        # 删除向量设置

来源:argilla-server/src/argilla_server/contexts/datasets.py:301-326

用户与工作空间端点
GET    /api/v1/me                                    # 获取当前用户
GET    /api/v1/users                                 # 列出所有用户
POST   /api/v1/users                                 # 创建用户
DELETE /api/v1/users/{user_id}                       # 删除用户
GET    /api/v1/workspaces                            # 列出所有工作空间
POST   /api/v1/workspaces                            # 创建工作空间
GET    /api/v1/workspaces/{workspace_id}/users       # 列出工作空间中的用户
POST   /api/v1/workspaces/{workspace_id}/users       # 向工作空间添加用户
DELETE /api/v1/workspaces/{workspace_id}/users/{user_id}  # 从工作空间移除用户

API 架构

Argilla API 组件
graph TD
    subgraph "客户端"
        PySDK["Python SDK<br>(argilla 包)"]
        DirectAPI["直接使用 REST API"]
    end

    subgraph "服务端"
        REST["REST API 层<br>(FastAPI)"]
        Context["上下文层<br>(业务逻辑)"]
        Models["数据库模型<br>(SQLAlchemy)"]
        DB[(数据库<br>PostgreSQL/SQLite)]
        Search[(搜索引擎<br>Elasticsearch/OpenSearch)]
    end

    PySDK --> REST
    DirectAPI --> REST
    REST --> Context
    Context --> Models
    Models --> DB
    Context --> Search

来源:argilla-server/src/argilla_server/contexts/datasets.py, argilla-server/src/argilla_server/database.py, argilla-server/src/argilla_server/models/database.py

记录创建与标注的 API 工作流
sequenceDiagram
    participant User as "用户"
    participant SDK as "rg.Dataset"
    participant REST as "REST API"
    participant DB as "数据库"
    participant Search as "搜索引擎"

    User->>SDK: 创建数据集
    SDK->>REST: POST /api/v1/datasets
    REST->>DB: 存储数据集信息
    REST-->>SDK: 返回数据集 ID
    SDK-->>User: 返回 Dataset 对象

    User->>SDK: 添加记录 (add_records)
    SDK->>REST: POST /api/v1/datasets/{id}/records/bulk
    REST->>DB: 存储记录
    REST->>Search: 索引记录
    REST-->>SDK: 成功响应
    SDK-->>User: 更新后的 Dataset 对象

    User->>SDK: 提交标注 (record.respond)
    SDK->>REST: POST /api/v1/me/responses/bulk
    REST->>DB: 存储响应
    REST->>Search: 更新记录响应
    REST-->>SDK: 成功响应
    SDK-->>User: 更新后的 Record 对象

    User->>SDK: 搜索记录 (dataset.records.search)
    SDK->>REST: POST /api/v1/datasets/{id}/records/search
    REST->>Search: 执行搜索查询
    Search-->>REST: 返回匹配的记录
    REST-->>SDK: 记录数据
    SDK-->>User: Record 对象

来源:argilla-server/src/argilla_server/contexts/datasets.py:480-540, argilla-server/src/argilla_server/bulk/records_bulk.py:46-89

数据模型

Argilla 的核心数据模型围绕数据集、记录、响应和建议展开:

classDiagram
    class Dataset {
        +UUID id
        +String name
        +String guidelines
        +Boolean allow_extra_metadata
        +DatasetStatus status
        +UUID workspace_id
        +DateTime inserted_at
        +DateTime updated_at
        +DateTime last_activity_at
    }

    class Field {
        +UUID id
        +String name
        +String title
        +Boolean required
        +JSON settings
        +UUID dataset_id
    }

    class Question {
        +UUID id
        +String name
        +String title
        +Boolean required
        +JSON settings
        +UUID dataset_id
    }

    class Record {
        +UUID id
        +String external_id
        +JSON fields
        +JSON metadata
        +RecordStatus status
        +UUID dataset_id
        +DateTime inserted_at
        +DateTime updated_at
    }

    class Response {
        +UUID id
        +JSON values
        +ResponseStatus status
        +UUID record_id
        +UUID user_id
        +DateTime inserted_at
        +DateTime updated_at
    }

    class Suggestion {
        +UUID id
        +JSON value
        +Float score
        +String agent
        +UUID record_id
        +UUID question_id
        +DateTime inserted_at
        +DateTime updated_at
    }

    class Vector {
        +UUID id
        +List~Float~ value
        +UUID record_id
        +UUID vector_settings_id
    }

    class VectorSettings {
        +UUID id
        +String name
        +String title
        +Integer dimensions
        +UUID dataset_id
    }

    class MetadataProperty {
        +UUID id
        +String name
        +String title
        +JSON settings
        +List~UserRole~ allowed_roles
        +UUID dataset_id
    }

    Dataset "1" -- "*" Field
    Dataset "1" -- "*" Question
    Dataset "1" -- "*" Record
    Dataset "1" -- "*" MetadataProperty
    Dataset "1" -- "*" VectorSettings
    Record "1" -- "*" Response
    Record "1" -- "*" Suggestion
    Record "1" -- "*" Vector
    Question "1" -- "*" Suggestion
    VectorSettings "1" -- "*" Vector

来源:argilla-server/src/argilla_server/models/database.py:73-104, argilla-server/src/argilla_server/models/database.py:220-277, argilla-server/src/argilla_server/models/database.py:280-320

响应状态与数据集状态

Argilla 使用不同的状态来跟踪响应和数据集的进度:

响应状态
状态描述
pending尚未提供响应
draft响应已保存为草稿但未提交
submitted响应已提交
discarded记录已被标记为废弃
数据集状态
状态描述
draft数据集处于草稿模式,可以修改
ready数据集已发布,可供标注

来源:argilla-frontend/translation/en.js:74-81, argilla-server/src/argilla_server/api/schemas/v1/datasets.py:15-46

Python SDK 响应过滤器

在 Python SDK 中过滤响应时,可以使用以下响应状态过滤器:

from argilla import ResponseStatusFilter

# 按已提交的响应过滤
dataset.responses.filter_by(status=ResponseStatusFilter.submitted)

# 其他可选值
# ResponseStatusFilter.pending
# ResponseStatusFilter.draft
# ResponseStatusFilter.discarded

来源:argilla-server/tests/unit/api/handlers/v1/test_datasets.py:31-42

常见 API 工作流

创建和配置数据集
import argilla as rg

# 初始化 Argilla 客户端
rg.init(api_url="http://localhost:6900", api_key="your-api-key")

# 创建数据集
dataset = rg.Dataset(
    name="sentiment-analysis",
    fields=[
        rg.TextField(name="text", title="文本内容")
    ],
    questions=[
        rg.LabelQuestion(
            name="sentiment",
            title="情感倾向",
            options=["positive", "negative", "neutral"]
        )
    ],
    guidelines="请将每条文本的情感倾向标注为正面、负面或中性。"
)

# 推送到 Argilla 服务器
remote_dataset = dataset.push_to_argilla()
记录日志与添加建议
# 创建带有建议的记录
records = [
    rg.FeedbackRecord(
        fields={"text": "我非常喜欢这个产品!"},
        suggestions=[{
            "question_name": "sentiment",
            "value": "positive",
            "score": 0.95,
            "agent": "model-v1"
        }]
    ),
    rg.FeedbackRecord(
        fields={"text": "这个效果不太好。"},
        suggestions=[{
            "question_name": "sentiment",
            "value": "negative",
            "score": 0.87,
            "agent": "model-v1"
        }]
    )
]

# 向数据集添加记录
dataset.add_records(records)
搜索和过滤记录
# 按文本内容搜索
results = dataset.records.search("产品")

# 按元数据过滤
filtered = dataset.records.filter_by(
    metadata_filters=[
        rg.TermsMetadataFilter(name="source", value="web")
    ]
)

# 查找相似记录
similar = dataset.find_similar_records(
    record_id="some-record-id",
    vector_name="embeddings",
    limit=5
)
提交和检索标注
# 为记录提交标注
record = dataset.records[0]
record.respond(
    question_name="sentiment",
    value="positive"
)

# 获取数据集的所有响应
all_responses = dataset.responses

# 按状态过滤响应
submitted = dataset.responses.filter_by(status="submitted")

来源:argilla-server/src/argilla_server/contexts/datasets.py:480-540, argilla-server/src/argilla_server/bulk/records_bulk.py:46-89