语音合成 API (TTS)
基于文本到语音(Text-to-Speech, TTS)的同步语音合成功能,单次请求支持的最大文本长度为 10000 字符,适用于短句生成、语音对话、在线社交等多种场景。
- 接口地址: https://api.senseaudio.cn/v1/t2a_v2
- 请求方式: POST
- Content-Type: application/json
- 鉴权方式: Bearer Token
| 参数名 | 必填 | 说明 | 示例 |
|---|
| Authorization | 是 | 鉴权 Token。格式:Bearer API_KEY | Bearer sk-123456… |
| Content-Type | 是 | 内容类型。固定为 application/json | application/json |
请求参数 (Request Body)
核心参数
| 参数名 | 类型 | 必填 | 描述 | 示例值 |
|---|
| model | string | 是 | 模型名称。固定值。 | SenseAudio-TTS-1.0 |
| text | string | 是 | 待合成的文本内容。支持中英文,最大 10000 字符。<break time=500>详解见下方停顿符说明 | 你好,<break time=500>世界 |
| stream | boolean | 是 | 流式输出。固定为 true。 | true |
| voice_setting | object | 是 | 音色相关设置。详见下表。 | { “voice_id”: ”…” } |
| audio_setting | object | 否 | 音频格式设置。详见下表。 | { “sample_rate”: 32000 } |
| dictionary | array | 否 | 多音字配置列表。详见下表(仅克隆音色使用、模型必须为SenseAudio-TTS-1.5) | [{“original”: “好干净”,“replacement”: “[hao4]干净”}] |
<break> 停顿符说明
<break> 用于在语音合成中插入停顿。
- time 单位为毫秒(ms)
- 500 表示停顿 500 毫秒
- 最小值为 100 毫秒,最大值无限制
示例:
你好<break time=500>欢迎使用我们的服务
voice_setting (音色设置)
| 参数名 | 类型 | 必填 | 描述 | 默认值 | 取值范围 |
|---|
| voice_id | string | 是 | 可用套餐音色ID、克隆音色ID,请参考 API音色服务说明。 | - | - |
| speed | float | 否 | 语速调节。 | 1.0 | [0.5, 2.0] |
| vol | float | 否 | 音量调节。 | 1.0 | [0, 10] |
| pitch | int | 否 | 音调调节。 | 0 | [-12, 12] |
audio_setting (音频设置)
| 参数名 | 类型 | 必填 | 描述 | 默认值 | 选项 |
|---|
| format | string | 否 | 音频编码格式。 | “mp3” | mp3, wav, pcm, flac |
| sample_rate | int | 否 | 音频采样率 (Hz)。 | 32000 | 8000, 16000, 22050, 24000, 32000, 44100 |
| bitrate | int | 否 | 比特率 (仅 MP3)。 | 128000 | 32000, 64000, 128000, 256000 |
| channel | int | 否 | 声道数。 | 2 | 1 (单声道), 2 (双声道) |
dictionary (多音字纠正)
| 参数名 | 类型 | 必填 | 描述 | 默认值 | 示例 |
|---|
| original | string | 是 | 原始文本。 | 无 | 铺床铺地,量米量酒杯 |
| replacement | int | 是 | 多音字配置。 | 无 | 铺床铺[di4],[liang2]米[liang4]酒杯 |
| 参数名 | 类型 | 说明 |
|---|
| data | object | 返回的合成数据对象,可能为 null,需进行非空判断 |
| data.audio | string | 合成后的音频数据,采用 hex 编码,格式与请求中指定的输出格式一致 |
| data.status | int64 | 当前音频流状态:1 表示合成中,2 表示合成结束 |
| extra_info | object | 音频的附加信息。流式返回时只有最后一个 chunk 会返回 |
| extra_info.audio_length | int64 | 音频时长(毫秒) |
| extra_info.audio_sample_rate | int64 | 音频采样率 |
| extra_info.audio_size | int64 | 音频文件大小(字节) |
| extra_info.bitrate | int64 | 音频比特率 |
| extra_info.audio_format | string | 生成音频文件的格式。取值范围:mp3, pcm, flac, wav |
| extra_info.audio_channel | int | 生成音频声道数。1:单声道,2:双声道 |
| extra_info.word_count | int64 | 字数:按 grapheme cluster 统计合成文本内容,且排除纯空白/标点/控制符的簇 |
| extra_info.character_count | int64 | 字符数:按 Unicode 码点统计合成文本内容 |
| base_resp | object | 本次请求的状态码和详情 |
| base_resp.status_code | int64 | 状态码(HTTP status code) |
| base_resp.status_message | string | 状态详情 |
响应示例
{
"data": {
"audio": "hex编码的音频数据...",
"status": 2
},
"extra_info": {
"audio_length": 3500,
"audio_sample_rate": 32000,
"audio_size": 56000,
"bitrate": 128000,
"audio_format": "mp3",
"audio_channel": 1,
"word_count": 24,
"character_count": 30
},
"base_resp": {
"status_code": 0,
"status_message": "success"
}
}
CURL
# 1. 发送请求并保存响应
curl -X POST https://api.senseaudio.cn/v1/t2a_v2 \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "SenseAudio-TTS-1.0",
"text": "道可道,非常道。名可名,非常名。无名天地之始,有名万物之母。",
"stream": false,
"voice_setting": {
"voice_id": "child_0001_a"
}
}' -o response.json
# 2. 提取 hex 音频数据并解码为二进制文件
jq -r '.data.audio' response.json | xxd -r -p > output.mp3
# 3. 查看音频信息
jq '.extra_info' response.json
Python
import requests
API_URL = "https://api.senseaudio.cn/v1/t2a_v2"
HEADERS = {
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
}
# 非流式合成
def tts_non_stream():
payload = {
"model": "SenseAudio-TTS-1.0",
"text": "道可道,非常道。名可名,非常名。",
"stream": False,
"voice_setting": {
"voice_id": "child_0001_a"
}
}
resp = requests.post(API_URL, json=payload, headers=HEADERS)
if resp.status_code == 200:
result = resp.json()
if result.get("data") and result["data"].get("audio"):
# 将 hex 编码的音频数据解码为二进制
audio_hex = result["data"]["audio"]
audio_bytes = bytes.fromhex(audio_hex)
with open("output.mp3", "wb") as f:
f.write(audio_bytes)
print("合成成功")
print(f"音频时长: {result['extra_info']['audio_length']}ms")
else:
print(f"合成失败: {result['base_resp']['status_message']}")
if __name__ == "__main__":
tts_non_stream()
JavaScript
const axios = require('axios');
const fs = require('fs');
const API_URL = 'https://api.senseaudio.cn/v1/t2a_v2';
const HEADERS = {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
};
// 非流式合成
async function tts() {
const payload = {
model: 'SenseAudio-TTS-1.0',
text: '道可道,非常道。名可名,非常名。',
stream: false,
voice_setting: {
voice_id: 'female_jiaomei'
}
};
const res = await axios.post(API_URL, payload, { headers: HEADERS });
const result = res.data;
if (result.data && result.data.audio) {
// 将 hex 编码的音频数据解码为二进制
const audioBuffer = Buffer.from(result.data.audio, 'hex');
fs.writeFileSync('output.mp3', audioBuffer);
console.log('合成成功');
console.log(`音频时长: ${result.extra_info.audio_length}ms`);
} else {
console.log(`合成失败: ${result.base_resp.status_message}`);
}
}
tts();
Go
package main
import (
"bytes"
"encoding/hex"
"encoding/json"
"fmt"
"io"
"net/http"
"os"
)
const (
APIURL = "https://api.senseaudio.cn/v1/t2a_v2"
APIKey = "YOUR_API_KEY"
)
type TTSRequest struct {
Model string `json:"model"`
Text string `json:"text"`
Stream bool `json:"stream"`
VoiceSetting VoiceSetting `json:"voice_setting"`
}
type VoiceSetting struct {
VoiceID string `json:"voice_id"`
}
type TTSResponse struct {
Data struct {
Audio string `json:"audio"`
Status int64 `json:"status"`
} `json:"data"`
ExtraInfo struct {
AudioLength int64 `json:"audio_length"`
} `json:"extra_info"`
BaseResp struct {
StatusCode int64 `json:"status_code"`
StatusMessage string `json:"status_message"`
} `json:"base_resp"`
}
func main() {
payload := TTSRequest{
Model: "SenseAudio-TTS-1.0",
Text: "道可道,非常道。名可名,非常名。",
Stream: false,
VoiceSetting: VoiceSetting{
VoiceID: "female_jiaomei",
},
}
jsonData, _ := json.Marshal(payload)
req, _ := http.NewRequest("POST", APIURL, bytes.NewBuffer(jsonData))
req.Header.Set("Authorization", "Bearer "+APIKey)
req.Header.Set("Content-Type", "application/json")
client := &http.Client{}
resp, err := client.Do(req)
if err != nil {
fmt.Println("请求失败:", err)
return
}
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
var result TTSResponse
json.Unmarshal(body, &result)
if result.Data.Audio != "" {
// 将 hex 编码的音频数据解码为二进制
audioBytes, _ := hex.DecodeString(result.Data.Audio)
os.WriteFile("output.mp3", audioBytes, 0644)
fmt.Println("合成成功")
fmt.Printf("音频时长: %dms\n", result.ExtraInfo.AudioLength)
} else {
fmt.Printf("合成失败: %s\n", result.BaseResp.StatusMessage)
}
}
Java
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import org.json.JSONObject;
public class SenseAudioTTS {
private static final String API_URL = "https://api.senseaudio.cn/v1/t2a_v2";
private static final String API_KEY = "YOUR_API_KEY";
public static void main(String[] args) {
try {
JSONObject voiceSetting = new JSONObject();
voiceSetting.put("voice_id", "female_jiaomei");
JSONObject payload = new JSONObject();
payload.put("model", "SenseAudio-TTS-1.0");
payload.put("text", "道可道,非常道。名可名,非常名。");
payload.put("stream", false);
payload.put("voice_setting", voiceSetting);
URL url = new URL(API_URL);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("POST");
conn.setRequestProperty("Authorization", "Bearer " + API_KEY);
conn.setRequestProperty("Content-Type", "application/json");
conn.setDoOutput(true);
try (OutputStream os = conn.getOutputStream()) {
byte[] input = payload.toString().getBytes("utf-8");
os.write(input, 0, input.length);
}
if (conn.getResponseCode() == 200) {
BufferedReader reader = new BufferedReader(
new InputStreamReader(conn.getInputStream(), "utf-8"));
StringBuilder response = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
response.append(line);
}
reader.close();
JSONObject result = new JSONObject(response.toString());
JSONObject data = result.optJSONObject("data");
if (data != null && data.has("audio")) {
// 将 hex 编码的音频数据解码为二进制
String audioHex = data.getString("audio");
byte[] audioBytes = hexStringToByteArray(audioHex);
try (FileOutputStream fos = new FileOutputStream("output.mp3")) {
fos.write(audioBytes);
}
System.out.println("合成成功");
System.out.println("音频时长: " +
result.getJSONObject("extra_info").getLong("audio_length") + "ms");
} else {
System.out.println("合成失败: " +
result.getJSONObject("base_resp").getString("status_message"));
}
} else {
System.out.println("请求失败, 状态码: " + conn.getResponseCode());
}
} catch (Exception e) {
System.out.println("请求异常: " + e.getMessage());
}
}
// hex 字符串转字节数组
private static byte[] hexStringToByteArray(String hex) {
int len = hex.length();
byte[] data = new byte[len / 2];
for (int i = 0; i < len; i += 2) {
data[i / 2] = (byte) ((Character.digit(hex.charAt(i), 16) << 4)
+ Character.digit(hex.charAt(i + 1), 16));
}
return data;
}
}
Swift
import Foundation
struct TTSRequest: Codable {
let model: String
let text: String
let stream: Bool
let voiceSetting: VoiceSetting
enum CodingKeys: String, CodingKey {
case model, text, stream
case voiceSetting = "voice_setting"
}
}
struct VoiceSetting: Codable {
let voiceId: String
enum CodingKeys: String, CodingKey {
case voiceId = "voice_id"
}
}
struct TTSResponse: Codable {
let data: AudioData?
let extraInfo: ExtraInfo?
let baseResp: BaseResp
enum CodingKeys: String, CodingKey {
case data
case extraInfo = "extra_info"
case baseResp = "base_resp"
}
}
struct AudioData: Codable {
let audio: String
let status: Int64
}
struct ExtraInfo: Codable {
let audioLength: Int64
enum CodingKeys: String, CodingKey {
case audioLength = "audio_length"
}
}
struct BaseResp: Codable {
let statusCode: Int64
let statusMessage: String
enum CodingKeys: String, CodingKey {
case statusCode = "status_code"
case statusMessage = "status_message"
}
}
func textToSpeech() {
let apiURL = "https://api.senseaudio.cn/v1/t2a_v2"
let apiKey = "YOUR_API_KEY"
let request = TTSRequest(
model: "SenseAudio-TTS-1.0",
text: "道可道,非常道。名可名,非常名。",
stream: false,
voiceSetting: VoiceSetting(voiceId: "female_jiaomei")
)
guard let url = URL(string: apiURL),
let jsonData = try? JSONEncoder().encode(request) else {
return
}
var urlRequest = URLRequest(url: url)
urlRequest.httpMethod = "POST"
urlRequest.setValue("Bearer \(apiKey)", forHTTPHeaderField: "Authorization")
urlRequest.setValue("application/json", forHTTPHeaderField: "Content-Type")
urlRequest.httpBody = jsonData
let task = URLSession.shared.dataTask(with: urlRequest) { data, response, error in
guard let data = data, error == nil else {
print("请求失败: \(error?.localizedDescription ?? "Unknown error")")
return
}
do {
let result = try JSONDecoder().decode(TTSResponse.self, from: data)
if let audioData = result.data {
// 将 hex 编码的音频数据解码为二进制
if let audioBytes = Data(hexString: audioData.audio) {
let fileURL = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask)[0]
.appendingPathComponent("output.mp3")
try? audioBytes.write(to: fileURL)
print("合成成功")
if let extraInfo = result.extraInfo {
print("音频时长: \(extraInfo.audioLength)ms")
}
}
} else {
print("合成失败: \(result.baseResp.statusMessage)")
}
} catch {
print("解析失败: \(error)")
}
}
task.resume()
}
// hex 字符串转 Data 扩展
extension Data {
init?(hexString: String) {
let len = hexString.count / 2
var data = Data(capacity: len)
var index = hexString.startIndex
for _ in 0..<len {
let nextIndex = hexString.index(index, offsetBy: 2)
if let byte = UInt8(hexString[index..<nextIndex], radix: 16) {
data.append(byte)
} else {
return nil
}
index = nextIndex
}
self = data
}
}
textToSpeech()