Audio

Authentication

参考接入 BytePower

Pronunciation Assessment

接口功能：语音发音评估，支持准确性、流利度、完整性等多维度评估

Method & Path

POST {domain}/bp/ai/audio/pronunciation
POST {domain}/bp/server/user/{user_id}/ai/audio/pronunciation

Request

说明：

request body 是 chunk 格式，需要分段发送
request body 编码为 Transfer-Encoding，但是不能在 header 添加 Transfer-Encoding
第一个 chunk 为 pronunciation 的参数
后续 chunk 为 audio stream 内容
audio stream，需要使用 base64 编码，并且不能添加 data:audio/mp3;base64, 前缀
audio stream 除开最后一个 chunk，其它 chunk base64 编码时不能填充
目前只支持 pcm,16hz,1 声道，16bit 音频格式

第一个 chunk 的参数:

Parameters	Type	Required	Desc
strategy	string	true	评估策略，目前支持 `azure_pronunciation`
data	object	true	评估配置参数

data 对象参数：

Parameters	Type	Required	Default	Desc
language	string	false	en-US	语言，默认 en-US
reference_text	string	true	-	用于评估的标准参考文本
grading_system	string	false	HundredMark	分数系统：FivePoint(0-5) 或 HundredMark(0-100)
granularity	string	false	Phoneme	评估粒度：Phoneme、Word 或 FullText
enable_miscue	boolean	false	false	启用误读检测 (Omission/Insertion)
enable_prosody_assessment	boolean	false	false	启用韵律评估 (重音、语调、语速、节奏)

example:

POST /bp/ai/audio/pronunciation HTTP/1.1
Content-Type: text/plain

B7\r\n{"strategy":"azure_pronunciation","data":{"reference_text":"Hello world","grading_system":"HundredMark","granularity":"Phoneme","enable_miscue":true,"enable_prosody_assessment":true}}\r\nx\r\{x_byte_audio}\r\n0\r\n\r\n

Response

json

{
  "text": "Hello world",
  "raw_data": {
    "Duration": 235600000,
    "Id": "5246cbcca9c94da6b88fe7e3800483ec",
    "NBest": [
      {
        "Confidence": 0.9827747,
        "PronunciationAssessment": {
          "AccuracyScore": 88,
          "CompletenessScore": 92,
          "FluencyScore": 95,
          "PronScore": 88.1,
          "ProsodyScore": 82.7
        }
      }
    ]
  },
  "unit_price": {
    "input_per_price": 0.0277,
    "output_per_price": 0,
    "input_token": 415,
    "output_token": 0
  },
  "strategy": "azure_pronunciation"
}

响应字段说明：

Field	Type	Desc
text	string	识别到的音频文本
raw_data	object	Azure 语音评估服务原始返回数据
unit_price	object	计费信息，包含输入输出单价和 token 消耗
strategy	string	使用的评估策略

PronunciationAssessment 评分说明：

Score	Desc
AccuracyScore	准确性评分，指示语音发音与参考文本的匹配程度
CompletenessScore	完整性评分，指示说出了参考文本的多少内容
FluencyScore	流利度评分，指示语音的自然程度
PronScore	综合发音评分
ProsodyScore	韵律评分，评估重音、语调、语速和节奏（需启用韵律评估）

Error

参数错误：

json

{
  "error": {
    "error_type": "invalid_parameter",
    "message": "invalid_parameter: reference_text is required"
  }
}

请求第三方失败：

json

{
  "error": {
    "error_type": "backend unavailable",
    "message": "XXX"
  }
}

Text-to-Speech

文本转语音

Method & Path

POST {domain}/bp/ai/audio/tts
POST {domain}/bp/server/user/{user_id}/ai/audio/tts

Request

Content-Type: application/json

Parameters	Type	Required	Desc
text	string	true	待转换的文本内容
strategies	string[]	true	语音合成策略列表，支持多策略回退，按顺序尝试直到成功
platform_params	object[]	false	平台参数数组，用于覆盖策略的默认配置（voice、prompt等）

platform_params 说明

platform_params 对象用于动态覆盖策略的默认配置，常用字段包括 platform（平台标识）、voice（音色）、prompt（提示词）等。

支持的平台：

azure_audio - Azure 语音合成服务
google_audio - Google Cloud Text-to-Speech
gemini - Google Gemini 语音合成
openai - OpenAI TTS
aws_audio - Amazon Polly

策略回退机制

TTS 接口支持多策略回退，当一个策略失败时会自动尝试下一个策略：

按顺序尝试：按 strategies 数组顺序依次尝试
失败条件：
- 配置错误（如缺少必要参数）
- API 调用失败且未发送任何音频数据
- 返回空音频
成功返回：任一策略成功即返回结果
全部失败：所有策略都失败时返回 all_strategies_failed 错误

⚠️注意：

如果某个策略已经开始发送 audio_chunk 事件，则不会重试其他策略
这是为了避免客户端接收到不完整或混乱的音频数据
因此建议将更稳定的策略放在前面

请求示例

json

{
  "text": "Hello, how are you today?",
  "strategies": [
    "tts_azure",
    "tts_openai",
    "tts_gemini"
  ],
  "platform_params": [
    {
      "platform": "azure_audio",
      "voice": "en-US-SerenaMultilingualNeural",
      "speed": 1.2
    },
    {
      "platform": "openai",
      "voice": "coral"
    },
    {
      "platform": "gemini",
      "language_code": "en-US",
      "voice": "Orus",
      "need_viseme": false
    }
  ]
}

Response

响应格式：Server-Sent Events (SSE) 流式推送

Content-Type: text/event-stream

完整响应示例

event: start
data: {"content":"start","timestamp":1768468269361}

event: audio_chunk
data: {"content":{"audio":"SUQzBAAAAAAAI1RTU0U..."},"timestamp":1768468272256}

event: audio_chunk
data: {"content":{"audio":"//uQxAAAAAAAAAAAAAA..."},"timestamp":1768468272257}

event: audio
data: {"content":{"audio_url":"https://xxx.mp3","unit_price":{"input_per_price":0,"input_token":42,"output_per_price":0,"output_token":277}},"timestamp":1768468274046}

event: end
data: {"content":"end","timestamp":1768468274046}

会话控制事件

Event Name	Desc
start	合成开始标记，包含时间戳
end	合成结束标记，包含时间戳
error	错误信息，如果出现错误会直接返回并且中断

音频传输事件

Event Name	Desc
audio_chunk	音频数据分片（base64 编码），多次推送，客户端按顺序接收并拼接成完整音频，支持流式播放
audio	完整音频信息，包含音频 URL、计费信息（unit_price）和口型数据 URL（viseme_url，仅在 need_viseme 为 true 时返回）

audio 事件 unit_price 字段说明：

Field	Type	Desc
input_per_price	number	输入单价
output_per_price	number	输出单价
input_token	int	输入 token 数量
output_token	int	输出 token 数量

Error

参数错误：

json

{
   "error": {
      "error_type": "invalid_parameter",
      "message": "invalid_parameter: text exceeds maximum length limit"
   }
}

所有策略失败：

json

{
   "error": {
      "error_type": "all_strategies_failed",
      "message": "all strategies failed"
   }
}

后端服务不可用：

event: error
data: {"error":{"error_type":"backend_unavailable","message":"backend unavailable"}}

Audio

Authentication

Pronunciation Assessment

Method & Path

Request

Response

Error

Text-to-Speech

Method & Path

Request

platform_params 说明

策略回退机制

请求示例

Response

完整响应示例

会话控制事件

音频传输事件

Error

附录

客户端传输数据格式

支持的音频格式

Audio ​

Authentication ​

Pronunciation Assessment ​

Method & Path ​

Request ​

Response ​

Error ​

Text-to-Speech ​

Method & Path ​

Request ​

platform_params 说明 ​

策略回退机制 ​

请求示例 ​

Response ​

完整响应示例 ​

会话控制事件 ​

音频传输事件 ​

Error ​

附录 ​

客户端传输数据格式 ​

支持的音频格式 ​

Audio

Authentication

Pronunciation Assessment

Method & Path

Request

Response

Error

Text-to-Speech

Method & Path

Request

platform_params 说明

策略回退机制

请求示例

Response

完整响应示例

会话控制事件

音频传输事件

Error

附录

客户端传输数据格式

支持的音频格式