采样 - MCP 中文文档

模型上下文协议 (MCP) 提供了一种标准化的方式，允许服务器通过客户端请求大语言模型 (LLM) 采样（“补全”或“生成”）。此流程允许客户端保持对模型访问、选择和权限的控制，同时使服务器能够利用 AI 能力——且无需服务器 API 密钥。服务器可以请求文本、音频或基于图像的交互，并可选择在其提示词中包含来自 MCP 服务器的上下文。

用户交互模型

MCP 中的采样允许服务器实现代理行为，通过使 LLM 调用能够嵌套发生在其他 MCP 服务器功能内部。实现可以自由地通过任何适合其需求的界面模式暴露采样——协议本身不强制任何特定的用户交互模型。

为了信任、安全和安全性，应当始终有人类介入循环，并拥有拒绝采样请求的能力。应用程序应当：

提供易于直观审查采样请求的界面
允许用户在发送前查看和编辑提示词
在交付前展示生成的响应以供审查

采样中的工具

服务器可以通过在其采样请求中提供 tools 数组和可选的 toolChoice 配置，请求客户端的 LLM 在采样期间使用工具。这使得服务器能够实现代理行为，其中 LLM 可以调用工具、接收结果并继续对话——所有这些都可以在单个采样请求流程中完成。客户端必须通过 sampling.tools 能力声明对工具使用的支持，才能接收启用工具的采样请求。服务器不得向未通过 sampling.tools 能力声明支持工具使用的客户端发送启用工具的采样请求。

能力

支持采样的客户端必须在初始化期间声明 sampling 能力： 基本采样：

{
  "capabilities": {
    "sampling": {}
  }
}

带有工具使用支持：

{
  "capabilities": {
    "sampling": {
      "tools": {}
    }
  }
}

带有上下文包含支持（软弃用）：

{
  "capabilities": {
    "sampling": {
      "context": {}
    }
  }
}

includeContext 参数值 "thisServer" 和 "allServers" 已软弃用。服务器应当避免使用这些值（例如可以直接省略 includeContext，因为它默认为 "none"），并且不得使用它们，除非客户端声明了 sampling.context 能力。这些值可能会在未来的规范版本中移除。

协议消息

创建消息

要请求大语言模型生成，服务器发送 sampling/createMessage 请求： 请求：

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "sampling/createMessage",
  "params": {
    "messages": [
      {
        "role": "user",
        "content": {
          "type": "text",
          "text": "What is the capital of France?"
        }
      }
    ],
    "modelPreferences": {
      "hints": [
        {
          "name": "claude-3-sonnet"
        }
      ],
      "intelligencePriority": 0.8,
      "speedPriority": 0.5
    },
    "systemPrompt": "You are a helpful assistant.",
    "maxTokens": 100
  }
}

响应：

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "role": "assistant",
    "content": {
      "type": "text",
      "text": "The capital of France is Paris."
    },
    "model": "claude-3-sonnet-20240307",
    "stopReason": "endTurn"
  }
}

使用工具采样

下图说明了使用工具的完整采样流程，包括多轮工具循环：要请求具有工具使用能力的 LLM 生成，服务器在请求中包含 tools 和可选的 toolChoice： 请求 (服务器 -> 客户端)：

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "sampling/createMessage",
  "params": {
    "messages": [
      {
        "role": "user",
        "content": {
          "type": "text",
          "text": "What's the weather like in Paris and London?"
        }
      }
    ],
    "tools": [
      {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "inputSchema": {
          "type": "object",
          "properties": {
            "city": {
              "type": "string",
              "description": "City name"
            }
          },
          "required": ["city"]
        }
      }
    ],
    "toolChoice": {
      "mode": "auto"
    },
    "maxTokens": 1000
  }
}

响应 (客户端 -> 服务器)：

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "role": "assistant",
    "content": [
      {
        "type": "tool_use",
        "id": "call_abc123",
        "name": "get_weather",
        "input": {
          "city": "Paris"
        }
      },
      {
        "type": "tool_use",
        "id": "call_def456",
        "name": "get_weather",
        "input": {
          "city": "London"
        }
      }
    ],
    "model": "claude-3-sonnet-20240307",
    "stopReason": "toolUse"
  }
}

多轮工具循环

在收到来自 LLM 的工具使用请求后，服务器通常：

执行请求的工具使用。
发送一个新的采样请求，附加工具结果。
接收 LLM 的响应（其中可能包含新的工具使用）。
根据需要重复多次（服务器可能会限制最大迭代次数，例如在最后一次迭代中传递 toolChoice: {mode: "none"} 以强制最终结果）。

后续请求 (服务器 -> 客户端) 附带工具结果：

{
  "jsonrpc": "2.0",
  "id": 2,
  "method": "sampling/createMessage",
  "params": {
    "messages": [
      {
        "role": "user",
        "content": {
          "type": "text",
          "text": "What's the weather like in Paris and London?"
        }
      },
      {
        "role": "assistant",
        "content": [
          {
            "type": "tool_use",
            "id": "call_abc123",
            "name": "get_weather",
            "input": { "city": "Paris" }
          },
          {
            "type": "tool_use",
            "id": "call_def456",
            "name": "get_weather",
            "input": { "city": "London" }
          }
        ]
      },
      {
        "role": "user",
        "content": [
          {
            "type": "tool_result",
            "toolUseId": "call_abc123",
            "content": [
              {
                "type": "text",
                "text": "Weather in Paris: 18°C, partly cloudy"
              }
            ]
          },
          {
            "type": "tool_result",
            "toolUseId": "call_def456",
            "content": [
              {
                "type": "text",
                "text": "Weather in London: 15°C, rainy"
              }
            ]
          }
        ]
      }
    ],
    "tools": [
      {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "inputSchema": {
          "type": "object",
          "properties": {
            "city": { "type": "string" }
          },
          "required": ["city"]
        }
      }
    ],
    "maxTokens": 1000
  }
}

最终响应 (客户端 -> 服务器)：

{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "role": "assistant",
    "content": {
      "type": "text",
      "text": "Based on the current weather data:\n\n- **Paris**: 18°C and partly cloudy - quite pleasant!\n- **London**: 15°C and rainy - you'll want an umbrella.\n\nParis has slightly warmer and drier conditions today."
    },
    "model": "claude-3-sonnet-20240307",
    "stopReason": "endTurn"
  }
}

消息内容约束

工具结果消息

当用户消息包含工具结果（类型：“tool_result”）时，它必须仅包含工具结果。不允许在同一消息中将工具结果与其他内容类型（文本、图像、音频）混合。此约束确保与使用专用角色处理工具结果的提供商 API 兼容（例如 OpenAI 的 “tool” 角色，Gemini 的 “function” 角色）。 有效 - 单个工具结果：

{
  "role": "user",
  "content": {
    "type": "tool_result",
    "toolUseId": "call_123",
    "content": [{ "type": "text", "text": "Result data" }]
  }
}

有效 - 多个工具结果：

{
  "role": "user",
  "content": [
    {
      "type": "tool_result",
      "toolUseId": "call_123",
      "content": [{ "type": "text", "text": "Result 1" }]
    },
    {
      "type": "tool_result",
      "toolUseId": "call_456",
      "content": [{ "type": "text", "text": "Result 2" }]
    }
  ]
}

无效 - 混合内容：

{
  "role": "user",
  "content": [
    {
      "type": "text",
      "text": "Here are the results:"
    },
    {
      "type": "tool_result",
      "toolUseId": "call_123",
      "content": [{ "type": "text", "text": "Result data" }]
    }
  ]
}

工具使用与结果平衡

在采样中使用工具使用时，每个包含 ToolUseContent 块的助手消息必须后跟一个完全由 ToolResultContent 块组成的用户消息，每个工具使用（例如 id: $id）都由相应的工具结果（toolUseId: $id）匹配，然后才是任何其他消息。此要求确保：

工具使用总是在对话继续之前得到解析
提供商 API 可以并发处理多个工具使用并并行获取其结果
对话保持一致的请求 - 响应模式

有效序列示例：

用户消息：“What’s the weather like in Paris and London?”
助手消息：ToolUseContent (id: "call_abc123", name: "get_weather", input: {city: "Paris"}) + ToolUseContent (id: "call_def456", name: "get_weather", input: {city: "London"})
用户消息：ToolResultContent (toolUseId: "call_abc123", content: "18°C, partly cloudy") + ToolResultContent (toolUseId: "call_def456", content: "15°C, rainy")
助手消息：文本响应比较两个城市的天气

无效序列 - 缺少工具结果：

用户消息：“What’s the weather like in Paris and London?”
助手消息：ToolUseContent (id: "call_abc123", name: "get_weather", input: {city: "Paris"}) + ToolUseContent (id: "call_def456", name: "get_weather", input: {city: "London"})
用户消息：ToolResultContent (toolUseId: "call_abc123", content: "18°C, partly cloudy") ← 缺少 call_def456 的结果
助手消息：文本响应（无效 - 并非所有工具使用都已解析）

跨 API 兼容性

采样规范旨在适用于多个 LLM 提供商 API（Claude、OpenAI、Gemini 等）。兼容性的关键设计决策：

消息角色

MCP 使用两个角色：“user” 和 “assistant”。工具使用请求在 CreateMessageResult 中以 “assistant” 角色发送。工具结果在消息中以 “user” 角色发回。包含工具结果的消息不能包含其他类型的内容。

工具选择模式

CreateMessageRequest.params.toolChoice 控制模型的工具使用能力：

{mode: "auto"}：模型决定是否使用工具（默认）
{mode: "required"}：模型在完成前必须使用至少一个工具
{mode: "none"}：模型不得使用任何工具

并行工具使用

MCP 允许模型并行发出多个工具使用请求（返回 ToolUseContent 数组）。所有主要提供商 API 都支持此功能：

Claude：原生支持并行工具使用
OpenAI：支持并行工具调用（可通过 parallel_tool_calls: false 禁用）
Gemini：原生支持并行函数调用

封装支持禁用并行工具使用的提供商的实现可以将此作为扩展公开，但这不是核心 MCP 规范的一部分。

消息流程

数据类型

消息

采样消息可以包含：

文本内容

{
  "type": "text",
  "text": "The message content"
}

图像内容

{
  "type": "image",
  "data": "base64-encoded-image-data",
  "mimeType": "image/jpeg"
}

音频内容

{
  "type": "audio",
  "data": "base64-encoded-audio-data",
  "mimeType": "audio/wav"
}

模型偏好

MCP 中的模型选择需要仔细抽象，因为服务器和客户端可能使用具有不同模型产品的不同 AI 提供商。服务器不能简单地按名称请求特定模型，因为客户端可能无法访问该确切模型，或者可能更喜欢使用不同提供商的等效模型。为了解决这个问题，MCP 实现了一个偏好系统，结合了抽象能力优先级和可选的模型提示：

能力优先级

服务器通过三个归一化的优先级值 (0-1) 来表达其需求：

costPriority：最小化成本有多重要？较高的值偏好更便宜的模型。
speedPriority：低延迟有多重要？较高的值偏好更快的模型。
intelligencePriority：高级能力有多重要？较高的值偏好功能更强大的模型。

模型提示

虽然优先级有助于根据特征选择模型，但 hints 允许服务器建议特定模型或模型系列：

提示被视为子字符串，可以灵活匹配模型名称
多个提示按偏好顺序评估
客户端 MAY 将提示映射到不同提供商的等效模型
提示是建议性的——客户端进行最终模型选择

例如：

{
  "hints": [
    { "name": "claude-3-sonnet" }, // 偏好 Sonnet 类模型
    { "name": "claude" } // 回退到任何 Claude 模型
  ],
  "costPriority": 0.3, // 成本不太重要
  "speedPriority": 0.8, // 速度非常重要
  "intelligencePriority": 0.5 // 中等能力需求
}

客户端处理这些偏好以从其可用选项中选择适当的模型。例如，如果客户端无法访问 Claude 模型但有 Gemini，它可能会根据类似的能力将 sonnet 提示映射到 gemini-1.5-pro。

错误处理

客户端 SHOULD 为常见失败情况返回错误：

用户拒绝采样请求：-1
请求中缺少工具结果：-32602（无效参数）
工具结果与其他内容混合：-32602（无效参数）

错误示例：

{
  "jsonrpc": "2.0",
  "id": 3,
  "error": {
    "code": -1,
    "message": "User rejected sampling request"
  }
}

{
  "jsonrpc": "2.0",
  "id": 4,
  "error": {
    "code": -32602,
    "message": "Tool result missing in request"
  }
}

安全考虑

客户端 SHOULD 实施用户批准控制
双方 SHOULD 验证消息内容
客户端 SHOULD 尊重模型偏好提示
客户端 SHOULD 实施速率限制
双方 MUST 妥善处理敏感数据

当采样中使用工具时，适用额外的安全考虑：

服务器 MUST 确保在回复 stopReason: "toolUse" 时，每个 ToolUseContent 项都响应一个具有匹配 toolUseId 的 ToolResultContent 项，并且用户消息仅包含工具结果（无其他内容类型）
双方 SHOULD 为工具循环实施迭代限制

​用户交互模型

​采样中的工具

​能力

​协议消息

​创建消息

​使用工具采样

​多轮工具循环

​消息内容约束

​工具结果消息

​工具使用与结果平衡

​跨 API 兼容性

​消息角色

​工具选择模式

​并行工具使用

​消息流程

​数据类型

​消息

​文本内容

​图像内容

​音频内容

​模型偏好

​能力优先级

​模型提示

​错误处理

​安全考虑

用户交互模型

采样中的工具

能力

协议消息

创建消息

使用工具采样

多轮工具循环

消息内容约束

工具结果消息

工具使用与结果平衡

跨 API 兼容性

消息角色

工具选择模式

并行工具使用

消息流程

数据类型

消息

文本内容

图像内容

音频内容

模型偏好

能力优先级

模型提示

错误处理

安全考虑