logo
平台介绍
快速接入
密钥管理
文本转语音
文本转语音介绍
POST
接口能力介绍(非流式)
SSE
接口能力介绍(流式)
WSS
接口能力介绍(WSS)
音色克隆
音色列表
智能体
视频生成
语音识别(ASR)
计费规则
常见问题
工作台
立即登录

语音合成 API - 流式 (TTS-SSE)

基于文本到语音(Text-to-Speech, TTS)的流式语音合成功能,单次请求支持的最大文本长度为 10000 字符,适用于低延迟、边合成边播放的实时语音生成需求。

接口概览

  • 接口地址: https://api.senseaudio.cn/v1/t2a_v2
  • 请求方式: POST
  • Content-Type: application/json
  • 鉴权方式: Bearer Token

请求配置

请求头 (Request Headers)

参数名必填说明示例
Authorization是鉴权 Token。格式:Bearer API_KEYBearer sk-123456…
Content-Type是内容类型。固定为 application/jsonapplication/json

请求参数 (Request Body)

核心参数

参数名类型必填描述示例值
modelstring是模型名称。固定值。SenseAudio-TTS-1.0
textstring是待合成的文本内容。支持中英文,最大 10000 字符。<break time=500>详解见下方停顿符说明你好,<break time=500>世界
streamboolean是流式输出。固定为 true。true
voice_settingobject是音色相关设置。详见下表。{ “voice_id”: ”…” }
audio_settingobject否音频格式设置。详见下表。{ “sample_rate”: 32000 }
dictionaryarray否多音字配置列表。详见下表(仅克隆音色使用、模型必须为SenseAudio-TTS-1.5)[{“original”: “好干净”,“replacement”: “[hao4]干净”}]

<break> 停顿符说明

<break> 用于在语音合成中插入停顿。

xml
复制
<break time=500>
  • time 单位为毫秒(ms)
  • 500 表示停顿 500 毫秒
  • 最小值为 100 毫秒,最大值无限制

示例:

text
复制
你好<break time=500>欢迎使用我们的服务

voice_setting (音色设置)

参数名类型必填描述默认值取值范围
voice_idstring是音色 ID。请参考 音色服务说明。--
speedfloat否语速调节。1.0[0.5, 2.0]
volfloat否音量调节。1.0[0, 10]
pitchint否音调调节。0[-12, 12]

audio_setting (音频设置)

参数名类型必填描述默认值选项
formatstring否音频编码格式。“mp3”mp3, wav, pcm, flac
sample_rateint否音频采样率 (Hz)。320008000, 16000, 22050, 24000, 32000, 44100
bitrateint否比特率 (仅 MP3)。12800032000, 64000, 128000, 256000
channelint否声道数。21 (单声道), 2 (双声道)

dictionary (多音字纠正)

参数名类型必填描述默认值示例
originalstring是原始文本。无铺床铺地,量米量酒杯
replacementint是多音字配置。无铺床铺[di4],[liang2]米[liang4]酒杯

响应结构

响应使用 SSE (Server-Sent Events) 格式,Content-Type 为 text/event-stream; charset=utf-8。

每个数据块以 data: 开头,后跟 JSON 对象。

响应参数

参数名类型说明
dataobject返回的合成数据对象,可能为 null,需进行非空判断
data.audiostring合成后的音频数据,采用 hex 编码,格式与请求中指定的输出格式一致
data.statusint64当前音频流状态:1 表示合成中,2 表示合成结束
extra_infoobject音频的附加信息。流式返回时只有最后一个 chunk 会返回
extra_info.audio_lengthint64音频时长(毫秒)
extra_info.audio_sample_rateint64音频采样率
extra_info.audio_sizeint64音频文件大小(字节)
extra_info.bitrateint64音频比特率
extra_info.audio_formatstring生成音频文件的格式。取值范围:mp3, pcm, flac, wav
extra_info.audio_channelint生成音频声道数。1:单声道,2:双声道
extra_info.word_countint64字数:按 grapheme cluster 统计合成文本内容,且排除纯空白/标点/控制符的簇
extra_info.character_countint64字符数:按 Unicode 码点统计合成文本内容
base_respobject本次请求的状态码和详情
base_resp.status_codeint64状态码(HTTP status code)
base_resp.status_messagestring状态详情

流式响应示例

plaintext
复制
data: {"data":{"audio":"49443304...","status":1},"extra_info":null,"base_resp":{"status_code":0,"status_message":""}} data: {"data":{"audio":"fffb9864...","status":1},"extra_info":null,"base_resp":{"status_code":0,"status_message":""}} data: {"data":{"audio":"fffb9864...","status":2},"extra_info":{"audio_length":2306,"audio_sample_rate":32000,"audio_size":36908,"bitrate":128000,"audio_format":"mp3","audio_channel":2,"word_count":24,"character_count":30},"base_resp":{"status_code":0,"status_message":"success"}}

代码示例

CURL

bash
复制
# 1. 发送流式请求并保存响应 curl -X POST https://api.senseaudio.cn/v1/t2a_v2 \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "SenseAudio-TTS-1.0", "text": "这是一个流式输出的例子。", "stream": true, "voice_setting": {"voice_id": "child_0001_a"} }' -o response.txt # 2. 提取所有 audio 字段的 hex 数据并解码为二进制文件 grep -oP '(?<="audio":")[^"]+' response.txt | tr -d '\n' | xxd -r -p > output.mp3

Python

python
复制
import requests import json API_URL = "https://api.senseaudio.cn/v1/t2a_v2" HEADERS = { "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json" } # 流式合成 (推荐用于长文本或实时场景) def tts_stream(): payload = { "model": "SenseAudio-TTS-1.0", "text": "这是一个流式输出的例子。", "stream": True, "voice_setting": {"voice_id": "child_0001_a"} } with requests.post(API_URL, json=payload, headers=HEADERS, stream=True) as r: with open("stream_output.mp3", "wb") as f: for line in r.iter_lines(): if line: # 去掉 "data: " 前缀 line_str = line.decode('utf-8') if line_str.startswith("data: "): line_str = line_str[6:] resp = json.loads(line_str) if "data" in resp and "audio" in resp["data"]: f.write(bytes.fromhex(resp["data"]["audio"])) print("流式合成完成") if __name__ == "__main__": tts_stream()

JavaScript

javascript
复制
const axios = require('axios'); const fs = require('fs'); const API_URL = 'https://api.senseaudio.cn/v1/t2a_v2'; const HEADERS = { 'Authorization': 'Bearer YOUR_API_KEY', 'Content-Type': 'application/json' }; async function ttsStream() { try { const payload = { model: 'SenseAudio-TTS-1.0', text: '这是一个流式输出的例子。', stream: true, voice_setting: { voice_id: 'child_0001_a' } }; const res = await axios.post(API_URL, payload, { headers: HEADERS, responseType: 'stream' }); const writeStream = fs.createWriteStream('stream_output.mp3'); res.data.on('data', (chunk) => { const lines = chunk.toString().split('\n'); for (const line of lines) { if (line.startsWith('data: ')) { try { const json = JSON.parse(line.slice(6)); if (json.data && json.data.audio) { writeStream.write(Buffer.from(json.data.audio, 'hex')); } } catch (e) {} } } }); res.data.on('end', () => { writeStream.end(); console.log('流式合成完成'); }); } catch (err) { console.error('请求异常:', err.message); } } ttsStream();

Go

go
复制
package main import ( "bufio" "bytes" "encoding/hex" "encoding/json" "fmt" "net/http" "os" "strings" ) const ( APIURL = "https://api.senseaudio.cn/v1/t2a_v2" APIKey = "YOUR_API_KEY" ) type TTSRequest struct { Model string `json:"model"` Text string `json:"text"` Stream bool `json:"stream"` VoiceSetting VoiceSetting `json:"voice_setting"` } type VoiceSetting struct { VoiceID string `json:"voice_id"` } type SSEResponse struct { Data struct { Audio string `json:"audio"` Status int `json:"status"` } `json:"data"` BaseResp struct { StatusCode int `json:"status_code"` StatusMessage string `json:"status_message"` } `json:"base_resp"` } func main() { payload := TTSRequest{ Model: "SenseAudio-TTS-1.0", Text: "这是一个流式输出的例子。", Stream: true, VoiceSetting: VoiceSetting{ VoiceID: "child_0001_a", }, } jsonData, _ := json.Marshal(payload) req, _ := http.NewRequest("POST", APIURL, bytes.NewBuffer(jsonData)) req.Header.Set("Authorization", "Bearer "+APIKey) req.Header.Set("Content-Type", "application/json") client := &http.Client{} resp, err := client.Do(req) if err != nil { fmt.Println("请求失败:", err) return } defer resp.Body.Close() file, _ := os.Create("stream_output.mp3") defer file.Close() scanner := bufio.NewScanner(resp.Body) for scanner.Scan() { line := scanner.Text() if strings.HasPrefix(line, "data: ") { var result SSEResponse json.Unmarshal([]byte(line[6:]), &result) if result.Data.Audio != "" { audioData, _ := hex.DecodeString(result.Data.Audio) file.Write(audioData) } } } fmt.Println("流式合成完成") }

Java

java
复制
import java.io.*; import java.net.HttpURLConnection; import java.net.URL; import org.json.JSONObject; public class SenseAudioTTSStream { private static final String API_URL = "https://api.senseaudio.cn/v1/t2a_v2"; private static final String API_KEY = "YOUR_API_KEY"; public static void main(String[] args) { try { // 构建请求体 JSONObject voiceSetting = new JSONObject(); voiceSetting.put("voice_id", "child_0001_a"); JSONObject payload = new JSONObject(); payload.put("model", "SenseAudio-TTS-1.0"); payload.put("text", "这是一个流式输出的例子。"); payload.put("stream", true); payload.put("voice_setting", voiceSetting); // 发送请求 URL url = new URL(API_URL); HttpURLConnection conn = (HttpURLConnection) url.openConnection(); conn.setRequestMethod("POST"); conn.setRequestProperty("Authorization", "Bearer " + API_KEY); conn.setRequestProperty("Content-Type", "application/json"); conn.setDoOutput(true); try (OutputStream os = conn.getOutputStream()) { byte[] input = payload.toString().getBytes("utf-8"); os.write(input, 0, input.length); } // 读取 SSE 流式响应 try (BufferedReader br = new BufferedReader( new InputStreamReader(conn.getInputStream(), "utf-8")); FileOutputStream fos = new FileOutputStream("stream_output.mp3")) { String line; while ((line = br.readLine()) != null) { if (line.startsWith("data: ")) { String jsonStr = line.substring(6); JSONObject result = new JSONObject(jsonStr); if (result.has("data")) { JSONObject data = result.getJSONObject("data"); if (data.has("audio")) { String audioHex = data.getString("audio"); // 手动解析 hex 字符串 byte[] audioData = new byte[audioHex.length() / 2]; for (int i = 0; i < audioData.length; i++) { int index = i * 2; int val = Integer.parseInt(audioHex.substring(index, index + 2), 16); audioData[i] = (byte) val; } fos.write(audioData); } } } } System.out.println("流式合成完成"); } } catch (Exception e) { System.out.println("请求异常: " + e.getMessage()); e.printStackTrace(); } } }

Swift

swift
复制
import Foundation struct TTSRequest: Codable { let model: String let text: String let stream: Bool let voiceSetting: VoiceSetting enum CodingKeys: String, CodingKey { case model, text, stream case voiceSetting = "voice_setting" } } struct VoiceSetting: Codable { let voiceId: String enum CodingKeys: String, CodingKey { case voiceId = "voice_id" } } struct SSEResponse: Codable { let data: AudioData? let baseResp: BaseResp? enum CodingKeys: String, CodingKey { case data case baseResp = "base_resp" } } struct AudioData: Codable { let audio: String? let status: Int? } struct BaseResp: Codable { let statusCode: Int? let statusMessage: String? enum CodingKeys: String, CodingKey { case statusCode = "status_code" case statusMessage = "status_message" } } func textToSpeechStream() { let apiURL = "https://api.senseaudio.cn/v1/t2a_v2" let apiKey = "YOUR_API_KEY" let request = TTSRequest( model: "SenseAudio-TTS-1.0", text: "这是一个流式输出的例子。", stream: true, voiceSetting: VoiceSetting(voiceId: "child_0001_a") ) guard let url = URL(string: apiURL), let jsonData = try? JSONEncoder().encode(request) else { return } var urlRequest = URLRequest(url: url) urlRequest.httpMethod = "POST" urlRequest.setValue("Bearer \(apiKey)", forHTTPHeaderField: "Authorization") urlRequest.setValue("application/json", forHTTPHeaderField: "Content-Type") urlRequest.httpBody = jsonData let semaphore = DispatchSemaphore(value: 0) var audioBuffer = Data() let task = URLSession.shared.dataTask(with: urlRequest) { data, response, error in defer { semaphore.signal() } guard let data = data, error == nil else { print("请求失败: \(error?.localizedDescription ?? "Unknown error")") return } // 解析 SSE 数据 if let dataString = String(data: data, encoding: .utf8) { let lines = dataString.components(separatedBy: "\n") for line in lines { if line.hasPrefix("data: ") { let jsonStr = String(line.dropFirst(6)) if let jsonData = jsonStr.data(using: .utf8), let result = try? JSONDecoder().decode(SSEResponse.self, from: jsonData), let audioHex = result.data?.audio { // Hex 转 Data var index = audioHex.startIndex while index < audioHex.endIndex { let nextIndex = audioHex.index(index, offsetBy: 2) if let byte = UInt8(audioHex[index..<nextIndex], radix: 16) { audioBuffer.append(byte) } index = nextIndex } } } } } // 保存文件 let fileURL = URL(fileURLWithPath: FileManager.default.currentDirectoryPath) .appendingPathComponent("stream_output.mp3") try? audioBuffer.write(to: fileURL) print("流式合成完成") } task.resume() semaphore.wait() } textToSpeechStream()