Skip to Content
教程结构化输出

结构化输出

结构化输出(Structured Outputs)让模型生成严格符合 JSON Schema 的数据,而非自由文本。这使得输出结果更可控、易于解析,特别适合数据提取、API 响应格式化、配置生成等场景。


核心概念

特性说明适用场景
JSON Schema 约束通过 Schema 定义输出结构数据提取、API 响应
严格模式强制模型遵循 Schema,拒绝无效输出生产环境、自动化流程
嵌套结构支持支持对象、数组、嵌套对象复杂数据结构
类型安全自动验证字段类型(string、number、boolean 等)类型敏感的应用

快速选择指南

  • 需要结构化数据:使用 response_format 指定 JSON Schema
  • 仅需有效 JSON:使用 json_object 模式(无需 Schema)
  • 复杂数据提取:配合工具调用(Function Calling)使用

为什么使用结构化输出?

传统方式的痛点

# 传统方式:让模型自由生成,然后手动解析 response = client.chat.completions.create( model="Ling-2.6-flash", # 此处以“Ling-2.6-flash”调用为例,可按需调整为“Ling-2.6-1T” messages=[{ "role": "user", "content": "提取以下文本中的姓名和年龄:张三,25岁" }] ) # 输出可能是: # "姓名:张三,年龄:25" # "{'name': '张三', 'age': 25}" # "根据文本,姓名是张三,年龄为25岁" # 需要复杂的正则或 LLM 再次解析,不可靠

结构化输出的优势

# 结构化输出:直接获得符合 Schema 的数据 response = client.chat.completions.create( model="Ling-2.6-flash", # 此处以“Ling-2.6-flash”调用为例,可按需调整为“Ling-2.6-1T” messages=[{ "role": "user", "content": "提取以下文本中的姓名和年龄:张三,25岁" }], response_format={ "type": "json_schema", "json_schema": { "name": "person_info", "schema": { "type": "object", "properties": { "name": {"type": "string"}, "age": {"type": "integer"} }, "required": ["name", "age"] } } } ) # 输出严格为: # {"name": "张三", "age": 25} # 可直接解析为 Python 对象 import json data = json.loads(response.choices[0].message.content) print(data["name"]) # 张三

快速开始

基础用法:JSON 对象模式

如果你只需要确保输出是有效的 JSON,而不需要严格约束结构:

from openai import OpenAI client = OpenAI( base_url="https://api.ant-ling.com/v1/", api_key="YOUR_API_KEY" ) response = client.chat.completions.create( model="Ling-2.6-flash", # 此处以“Ling-2.6-flash”调用为例,可按需调整为“Ling-2.6-1T” messages=[{ "role": "user", "content": "列出三种可再生能源,以 JSON 数组格式返回" }], response_format={"type": "json_object"} ) import json data = json.loads(response.choices[0].message.content) print(data) # 输出: ["太阳能", "风能", "水能"] 或类似的有效 JSON

高级用法:JSON Schema 严格模式

通过 JSON Schema 精确控制输出结构:

from openai import OpenAI client = OpenAI( base_url="https://api.ant-ling.com/v1/", api_key="YOUR_API_KEY" ) response = client.chat.completions.create( model="Ling-2.6-flash", # 此处以“Ling-2.6-flash”调用为例,可按需调整为“Ling-2.6-1T” messages=[{ "role": "user", "content": """分析以下产品评价: "这款手机电池续航很棒,但价格有点贵。" 提取情感倾向和关键要点。""" }], response_format={ "type": "json_schema", "json_schema": { "name": "sentiment_analysis", "schema": { "type": "object", "properties": { "overall_sentiment": { "type": "string", "enum": ["positive", "neutral", "negative"], "description": "整体情感倾向" }, "key_points": { "type": "array", "items": { "type": "object", "properties": { "aspect": { "type": "string", "description": "评价方面" }, "sentiment": { "type": "string", "enum": ["positive", "neutral", "negative"], "description": "该方面的情感" }, "mention": { "type": "string", "description": "原文提及" } }, "required": ["aspect", "sentiment", "mention"] } } }, "required": ["overall_sentiment", "key_points"] } } } ) import json result = json.loads(response.choices[0].message.content) print(json.dumps(result, ensure_ascii=False, indent=2))

输出示例

{ "overall_sentiment": "neutral", "key_points": [ { "aspect": "电池续航", "sentiment": "positive", "mention": "电池续航很棒" }, { "aspect": "价格", "sentiment": "negative", "mention": "价格有点贵" } ] }

JSON Schema 详解

支持的类型

JSON Schema 类型说明示例
string字符串"name": {"type": "string"}
integer整数"age": {"type": "integer"}
number数字(含小数)"price": {"type": "number"}
boolean布尔值"active": {"type": "boolean"}
array数组"tags": {"type": "array", "items": {"type": "string"}}
object对象"address": {"type": "object", "properties": {...}}

常用约束

{ "type": "object", "properties": { "username": { "type": "string", "minLength": 3, "maxLength": 20, "description": "用户名,3-20个字符" }, "email": { "type": "string", "format": "email", "description": "邮箱地址" }, "age": { "type": "integer", "minimum": 0, "maximum": 150, "description": "年龄" }, "role": { "type": "string", "enum": ["user", "admin", "guest"], "description": "用户角色" }, "tags": { "type": "array", "items": { "type": "string" }, "maxItems": 10, "description": "标签列表" } }, "required": ["username", "email"] }

嵌套结构

{ "type": "object", "properties": { "company": { "type": "object", "properties": { "name": { "type": "string" }, "address": { "type": "object", "properties": { "city": { "type": "string" }, "country": { "type": "string" } }, "required": ["city", "country"] } }, "required": ["name"] }, "employees": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "department": { "type": "string" } }, "required": ["name"] } } }, "required": ["company"] }

典型应用场景

1. 数据提取与解析

从非结构化文本中提取结构化信息:

schema = { "name": "invoice_extraction", "schema": { "type": "object", "properties": { "invoice_number": {"type": "string"}, "date": {"type": "string", "format": "date"}, "vendor": {"type": "string"}, "items": { "type": "array", "items": { "type": "object", "properties": { "description": {"type": "string"}, "quantity": {"type": "integer"}, "unit_price": {"type": "number"} } } }, "total_amount": {"type": "number"} }, "required": ["invoice_number", "date", "total_amount"] } }

2. API 响应格式化

确保模型输出符合后端 API 期望的格式:

schema = { "name": "api_response", "schema": { "type": "object", "properties": { "status": { "type": "string", "enum": ["success", "error"] }, "data": {"type": "object"}, "message": {"type": "string"}, "timestamp": {"type": "string", "format": "date-time"} }, "required": ["status"] } }

3. 配置生成

生成符合特定格式的配置文件:

schema = { "name": "app_config", "schema": { "type": "object", "properties": { "app_name": {"type": "string"}, "version": {"type": "string"}, "features": { "type": "object", "properties": { "dark_mode": {"type": "boolean"}, "notifications": {"type": "boolean"}, "max_items": {"type": "integer", "minimum": 1} } } }, "required": ["app_name", "version"] } }

4. 多步骤推理结果

将复杂推理过程分解为结构化步骤:

schema = { "name": "math_solution", "schema": { "type": "object", "properties": { "problem": {"type": "string"}, "steps": { "type": "array", "items": { "type": "object", "properties": { "step_number": {"type": "integer"}, "description": {"type": "string"}, "calculation": {"type": "string"} } } }, "final_answer": {"type": "string"}, "verification": {"type": "string"} }, "required": ["problem", "steps", "final_answer"] } }

与工具调用的对比

特性结构化输出工具调用(Function Calling)
主要用途格式化模型响应让模型调用外部功能
执行时机仅格式化输出可能触发实际函数执行
交互模式单次响应可能多轮交互
适用场景数据提取、格式化需要外部数据或操作
Schema 定义response_formattools 参数

选择建议

  • 只需要结构化数据 → 使用结构化输出
  • 需要调用 API、查询数据库等 → 使用工具调用

最佳实践

1. 提供清晰的描述

在 Schema 中为字段添加 description,帮助模型理解意图:

{ "properties": { "confidence": { "type": "number", "minimum": 0, "maximum": 1, "description": "置信度分数,0表示完全不确定,1表示完全确定" } } }

2. 合理使用 required

明确标记必需字段,但不要过度约束:

{ "properties": { "name": { "type": "string" }, "email": { "type": "string" }, "phone": { "type": "string" } }, "required": ["name"] // phone 和 email 可选,适应不同场景 }

3. 使用 enum 限制取值

当字段有固定选项时,使用 enum 提高准确性:

{ "sentiment": { "type": "string", "enum": ["positive", "neutral", "negative"] } }

4. 处理大数组

对于可能很长的数组,设置合理的限制:

{ "items": { "type": "array", "items": { "type": "string" }, "maxItems": 100, "description": "最多返回100项" } }

5. 结合系统提示

使用系统提示进一步指导模型:

messages=[ { "role": "system", "content": "你是一个数据提取助手。请从用户提供的文本中提取关键信息,严格按照指定的 JSON Schema 格式返回。" }, { "role": "user", "content": "提取以下简历信息:..." } ]

常见问题

Q1: 模型不遵循 Schema 怎么办?

  • 检查 Schema 语法是否正确(可使用在线 JSON Schema 验证工具)
  • 确保 Schema 不太复杂,必要时拆分为多个简单 Schema
  • 在提示中明确说明输出格式要求
  • 使用更强大的模型(如 Ling-2.6-1T/Ling-2.6-flash)

Q2: 如何处理可选字段?

不要将所有字段设为 required,对于可选字段:

{ "properties": { "required_field": { "type": "string" }, "optional_field": { "type": ["string", "null"], "description": "可选字段,无值时返回 null" } }, "required": ["required_field"] }

Q3: 支持多复杂的 Schema?

建议遵循以下原则:

  • 嵌套层级不超过 3-4 层
  • 数组项结构保持一致
  • 字段总数控制在 20 个以内
  • 过复杂的 Schema 可能影响模型性能

Q4: 流式输出支持结构化输出吗?

支持。启用流式输出后,模型会逐步生成 JSON 内容:

response = client.chat.completions.create( model="Ling-2.6-flash", # 此处以“Ling-2.6-flash”调用为例,可按需调整为“Ling-2.6-1T” messages=messages, response_format={"type": "json_schema", "json_schema": schema}, stream=True # 启用流式输出 ) # 需要累积所有 chunk 后再解析 JSON content = "" for chunk in response: if chunk.choices[0].delta.content: content += chunk.choices[0].delta.content # 流结束后解析 data = json.loads(content)

Q5: 结构化输出会增加延迟吗?

结构化输出需要额外的验证和约束处理,可能会有轻微延迟增加。建议:

  • 对于实时性要求高的场景,权衡使用结构化输出的必要性
  • 使用更简单的 Schema 减少处理时间
  • 考虑缓存常见查询的结果

相关资源


JSON Schema 参考

更多 JSON Schema 语法和高级特性,请参考 JSON Schema 官方文档 

Was this page helpful?
Last updated on