语音合成 API - 流式 (TTS-SSE)
基于文本到语音(Text-to-Speech, TTS)的流式语音合成功能,单次请求支持的最大文本长度为 10000 字符,适用于低延迟、边合成边播放的实时语音生成需求。
- 接口地址: https://api.senseaudio.cn/v1/t2a_v2
- 请求方式: POST
- Content-Type: application/json
- 鉴权方式: Bearer Token
| 参数名 | 必填 | 说明 | 示例 |
|---|
| Authorization | 是 | 鉴权 Token。格式:Bearer API_KEY | Bearer sk-123456… |
| Content-Type | 是 | 内容类型。固定为 application/json | application/json |
请求参数 (Request Body)
核心参数
| 参数名 | 类型 | 必填 | 描述 | 示例值 |
|---|
| model | string | 是 | 模型名称。固定值。 | SenseAudio-TTS-1.0 |
| text | string | 是 | 待合成的文本内容。支持中英文,最大 10000 字符。<break time=500>详解见下方停顿符说明 | 你好,<break time=500>世界 |
| stream | boolean | 是 | 流式输出。固定为 true。 | true |
| voice_setting | object | 是 | 音色相关设置。详见下表。 | { “voice_id”: ”…” } |
| audio_setting | object | 否 | 音频格式设置。详见下表。 | { “sample_rate”: 32000 } |
| dictionary | array | 否 | 多音字配置列表。详见下表(仅克隆音色使用、模型必须为SenseAudio-TTS-1.5) | [{“original”: “好干净”,“replacement”: “[hao4]干净”}] |
<break> 停顿符说明
<break> 用于在语音合成中插入停顿。
- time 单位为毫秒(ms)
- 500 表示停顿 500 毫秒
- 最小值为 100 毫秒,最大值无限制
示例:
你好<break time=500>欢迎使用我们的服务
voice_setting (音色设置)
| 参数名 | 类型 | 必填 | 描述 | 默认值 | 取值范围 |
|---|
| voice_id | string | 是 | 音色 ID。请参考 音色服务说明。 | - | - |
| speed | float | 否 | 语速调节。 | 1.0 | [0.5, 2.0] |
| vol | float | 否 | 音量调节。 | 1.0 | [0, 10] |
| pitch | int | 否 | 音调调节。 | 0 | [-12, 12] |
audio_setting (音频设置)
| 参数名 | 类型 | 必填 | 描述 | 默认值 | 选项 |
|---|
| format | string | 否 | 音频编码格式。 | “mp3” | mp3, wav, pcm, flac |
| sample_rate | int | 否 | 音频采样率 (Hz)。 | 32000 | 8000, 16000, 22050, 24000, 32000, 44100 |
| bitrate | int | 否 | 比特率 (仅 MP3)。 | 128000 | 32000, 64000, 128000, 256000 |
| channel | int | 否 | 声道数。 | 2 | 1 (单声道), 2 (双声道) |
dictionary (多音字纠正)
| 参数名 | 类型 | 必填 | 描述 | 默认值 | 示例 |
|---|
| original | string | 是 | 原始文本。 | 无 | 铺床铺地,量米量酒杯 |
| replacement | int | 是 | 多音字配置。 | 无 | 铺床铺[di4],[liang2]米[liang4]酒杯 |
响应使用 SSE (Server-Sent Events) 格式,Content-Type 为 text/event-stream; charset=utf-8。
每个数据块以 data: 开头,后跟 JSON 对象。
响应参数
| 参数名 | 类型 | 说明 |
|---|
| data | object | 返回的合成数据对象,可能为 null,需进行非空判断 |
| data.audio | string | 合成后的音频数据,采用 hex 编码,格式与请求中指定的输出格式一致 |
| data.status | int64 | 当前音频流状态:1 表示合成中,2 表示合成结束 |
| extra_info | object | 音频的附加信息。流式返回时只有最后一个 chunk 会返回 |
| extra_info.audio_length | int64 | 音频时长(毫秒) |
| extra_info.audio_sample_rate | int64 | 音频采样率 |
| extra_info.audio_size | int64 | 音频文件大小(字节) |
| extra_info.bitrate | int64 | 音频比特率 |
| extra_info.audio_format | string | 生成音频文件的格式。取值范围:mp3, pcm, flac, wav |
| extra_info.audio_channel | int | 生成音频声道数。1:单声道,2:双声道 |
| extra_info.word_count | int64 | 字数:按 grapheme cluster 统计合成文本内容,且排除纯空白/标点/控制符的簇 |
| extra_info.character_count | int64 | 字符数:按 Unicode 码点统计合成文本内容 |
| base_resp | object | 本次请求的状态码和详情 |
| base_resp.status_code | int64 | 状态码(HTTP status code) |
| base_resp.status_message | string | 状态详情 |
流式响应示例
data: {"data":{"audio":"49443304...","status":1},"extra_info":null,"base_resp":{"status_code":0,"status_message":""}}
data: {"data":{"audio":"fffb9864...","status":1},"extra_info":null,"base_resp":{"status_code":0,"status_message":""}}
data: {"data":{"audio":"fffb9864...","status":2},"extra_info":{"audio_length":2306,"audio_sample_rate":32000,"audio_size":36908,"bitrate":128000,"audio_format":"mp3","audio_channel":2,"word_count":24,"character_count":30},"base_resp":{"status_code":0,"status_message":"success"}}
CURL
# 1. 发送流式请求并保存响应
curl -X POST https://api.senseaudio.cn/v1/t2a_v2 \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "SenseAudio-TTS-1.0",
"text": "这是一个流式输出的例子。",
"stream": true,
"voice_setting": {"voice_id": "child_0001_a"}
}' -o response.txt
# 2. 提取所有 audio 字段的 hex 数据并解码为二进制文件
grep -oP '(?<="audio":")[^"]+' response.txt | tr -d '\n' | xxd -r -p > output.mp3
Python
import requests
import json
API_URL = "https://api.senseaudio.cn/v1/t2a_v2"
HEADERS = {
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
}
# 流式合成 (推荐用于长文本或实时场景)
def tts_stream():
payload = {
"model": "SenseAudio-TTS-1.0",
"text": "这是一个流式输出的例子。",
"stream": True,
"voice_setting": {"voice_id": "child_0001_a"}
}
with requests.post(API_URL, json=payload, headers=HEADERS, stream=True) as r:
with open("stream_output.mp3", "wb") as f:
for line in r.iter_lines():
if line:
# 去掉 "data: " 前缀
line_str = line.decode('utf-8')
if line_str.startswith("data: "):
line_str = line_str[6:]
resp = json.loads(line_str)
if "data" in resp and "audio" in resp["data"]:
f.write(bytes.fromhex(resp["data"]["audio"]))
print("流式合成完成")
if __name__ == "__main__":
tts_stream()
JavaScript
const axios = require('axios');
const fs = require('fs');
const API_URL = 'https://api.senseaudio.cn/v1/t2a_v2';
const HEADERS = {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
};
async function ttsStream() {
try {
const payload = {
model: 'SenseAudio-TTS-1.0',
text: '这是一个流式输出的例子。',
stream: true,
voice_setting: { voice_id: 'child_0001_a' }
};
const res = await axios.post(API_URL, payload, {
headers: HEADERS,
responseType: 'stream'
});
const writeStream = fs.createWriteStream('stream_output.mp3');
res.data.on('data', (chunk) => {
const lines = chunk.toString().split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
try {
const json = JSON.parse(line.slice(6));
if (json.data && json.data.audio) {
writeStream.write(Buffer.from(json.data.audio, 'hex'));
}
} catch (e) {}
}
}
});
res.data.on('end', () => {
writeStream.end();
console.log('流式合成完成');
});
} catch (err) {
console.error('请求异常:', err.message);
}
}
ttsStream();
Go
package main
import (
"bufio"
"bytes"
"encoding/hex"
"encoding/json"
"fmt"
"net/http"
"os"
"strings"
)
const (
APIURL = "https://api.senseaudio.cn/v1/t2a_v2"
APIKey = "YOUR_API_KEY"
)
type TTSRequest struct {
Model string `json:"model"`
Text string `json:"text"`
Stream bool `json:"stream"`
VoiceSetting VoiceSetting `json:"voice_setting"`
}
type VoiceSetting struct {
VoiceID string `json:"voice_id"`
}
type SSEResponse struct {
Data struct {
Audio string `json:"audio"`
Status int `json:"status"`
} `json:"data"`
BaseResp struct {
StatusCode int `json:"status_code"`
StatusMessage string `json:"status_message"`
} `json:"base_resp"`
}
func main() {
payload := TTSRequest{
Model: "SenseAudio-TTS-1.0",
Text: "这是一个流式输出的例子。",
Stream: true,
VoiceSetting: VoiceSetting{
VoiceID: "child_0001_a",
},
}
jsonData, _ := json.Marshal(payload)
req, _ := http.NewRequest("POST", APIURL, bytes.NewBuffer(jsonData))
req.Header.Set("Authorization", "Bearer "+APIKey)
req.Header.Set("Content-Type", "application/json")
client := &http.Client{}
resp, err := client.Do(req)
if err != nil {
fmt.Println("请求失败:", err)
return
}
defer resp.Body.Close()
file, _ := os.Create("stream_output.mp3")
defer file.Close()
scanner := bufio.NewScanner(resp.Body)
for scanner.Scan() {
line := scanner.Text()
if strings.HasPrefix(line, "data: ") {
var result SSEResponse
json.Unmarshal([]byte(line[6:]), &result)
if result.Data.Audio != "" {
audioData, _ := hex.DecodeString(result.Data.Audio)
file.Write(audioData)
}
}
}
fmt.Println("流式合成完成")
}
Java
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import org.json.JSONObject;
public class SenseAudioTTSStream {
private static final String API_URL = "https://api.senseaudio.cn/v1/t2a_v2";
private static final String API_KEY = "YOUR_API_KEY";
public static void main(String[] args) {
try {
// 构建请求体
JSONObject voiceSetting = new JSONObject();
voiceSetting.put("voice_id", "child_0001_a");
JSONObject payload = new JSONObject();
payload.put("model", "SenseAudio-TTS-1.0");
payload.put("text", "这是一个流式输出的例子。");
payload.put("stream", true);
payload.put("voice_setting", voiceSetting);
// 发送请求
URL url = new URL(API_URL);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("POST");
conn.setRequestProperty("Authorization", "Bearer " + API_KEY);
conn.setRequestProperty("Content-Type", "application/json");
conn.setDoOutput(true);
try (OutputStream os = conn.getOutputStream()) {
byte[] input = payload.toString().getBytes("utf-8");
os.write(input, 0, input.length);
}
// 读取 SSE 流式响应
try (BufferedReader br = new BufferedReader(
new InputStreamReader(conn.getInputStream(), "utf-8"));
FileOutputStream fos = new FileOutputStream("stream_output.mp3")) {
String line;
while ((line = br.readLine()) != null) {
if (line.startsWith("data: ")) {
String jsonStr = line.substring(6);
JSONObject result = new JSONObject(jsonStr);
if (result.has("data")) {
JSONObject data = result.getJSONObject("data");
if (data.has("audio")) {
String audioHex = data.getString("audio");
// 手动解析 hex 字符串
byte[] audioData = new byte[audioHex.length() / 2];
for (int i = 0; i < audioData.length; i++) {
int index = i * 2;
int val = Integer.parseInt(audioHex.substring(index, index + 2), 16);
audioData[i] = (byte) val;
}
fos.write(audioData);
}
}
}
}
System.out.println("流式合成完成");
}
} catch (Exception e) {
System.out.println("请求异常: " + e.getMessage());
e.printStackTrace();
}
}
}
Swift
import Foundation
struct TTSRequest: Codable {
let model: String
let text: String
let stream: Bool
let voiceSetting: VoiceSetting
enum CodingKeys: String, CodingKey {
case model, text, stream
case voiceSetting = "voice_setting"
}
}
struct VoiceSetting: Codable {
let voiceId: String
enum CodingKeys: String, CodingKey {
case voiceId = "voice_id"
}
}
struct SSEResponse: Codable {
let data: AudioData?
let baseResp: BaseResp?
enum CodingKeys: String, CodingKey {
case data
case baseResp = "base_resp"
}
}
struct AudioData: Codable {
let audio: String?
let status: Int?
}
struct BaseResp: Codable {
let statusCode: Int?
let statusMessage: String?
enum CodingKeys: String, CodingKey {
case statusCode = "status_code"
case statusMessage = "status_message"
}
}
func textToSpeechStream() {
let apiURL = "https://api.senseaudio.cn/v1/t2a_v2"
let apiKey = "YOUR_API_KEY"
let request = TTSRequest(
model: "SenseAudio-TTS-1.0",
text: "这是一个流式输出的例子。",
stream: true,
voiceSetting: VoiceSetting(voiceId: "child_0001_a")
)
guard let url = URL(string: apiURL),
let jsonData = try? JSONEncoder().encode(request) else {
return
}
var urlRequest = URLRequest(url: url)
urlRequest.httpMethod = "POST"
urlRequest.setValue("Bearer \(apiKey)", forHTTPHeaderField: "Authorization")
urlRequest.setValue("application/json", forHTTPHeaderField: "Content-Type")
urlRequest.httpBody = jsonData
let semaphore = DispatchSemaphore(value: 0)
var audioBuffer = Data()
let task = URLSession.shared.dataTask(with: urlRequest) { data, response, error in
defer { semaphore.signal() }
guard let data = data, error == nil else {
print("请求失败: \(error?.localizedDescription ?? "Unknown error")")
return
}
// 解析 SSE 数据
if let dataString = String(data: data, encoding: .utf8) {
let lines = dataString.components(separatedBy: "\n")
for line in lines {
if line.hasPrefix("data: ") {
let jsonStr = String(line.dropFirst(6))
if let jsonData = jsonStr.data(using: .utf8),
let result = try? JSONDecoder().decode(SSEResponse.self, from: jsonData),
let audioHex = result.data?.audio {
// Hex 转 Data
var index = audioHex.startIndex
while index < audioHex.endIndex {
let nextIndex = audioHex.index(index, offsetBy: 2)
if let byte = UInt8(audioHex[index..<nextIndex], radix: 16) {
audioBuffer.append(byte)
}
index = nextIndex
}
}
}
}
}
// 保存文件
let fileURL = URL(fileURLWithPath: FileManager.default.currentDirectoryPath)
.appendingPathComponent("stream_output.mp3")
try? audioBuffer.write(to: fileURL)
print("流式合成完成")
}
task.resume()
semaphore.wait()
}
textToSpeechStream()