音频模型

本文档系统性地梳理所有语音类接口，按 ASR（语音识别）/ TTS（语音合成）/ 组合能力三大类分类。

一、语音识别（ASR）

1.1 异步语音转写（创建Task）

请求说明

基本信息

请求地址：https://www.sophnet.com/api/open-apis/projects/easyllms/speechtotext/transcriptions

请求方式：POST

可用模型：Fun-ASR / Fun-ASR-2025-08-25 / Fun-ASR-Nano-2512 / paraformer-realtime / Qwen3-ASR-Flash-2025-09-08
支持两种请求方式

方式一：音频文件链接模式（JSON）

Header参数

名称	类型	必填	描述
Content-Type	String	是	固定值`application/json`
Authorization	String	是	"Bearer " + Apikey

Body参数（旧参数，即将废弃，请尽快使用新参数）

名称	类型	必填	描述
audio_url	string	是	语音url路径支持音频格式：wav、mp3、m4a、flv、mp4、wma、3gp、amr、aac、ogg-opus、flac 音频限制：音频 URL 时长不能大于5小时，文件大小不超过1GB 识别有效时间：识别结果在服务端保存24小时

Body参数（Fun-ASR 新参数）

名称	类型	必填	描述
file_urls	array(string)	是	语音url列表。音频文件有如下限制： - 音频数量限制：单次请求最多支持100个URL。 - 音频格式限制：支持`aac、amr、avi、flac、flv、m4a、mkv、mov、mp3、mp4、mpeg、ogg、opus、wav、webm、wma、wmv`。 - 音频采样率：任意采样率。 - 音频文件大小和时长限制：音频文件不超过32MB，时长在12小时以内。 - 音频语言限制：支持中文、英文、日语。
speech_recognition_param.model	string	是	模型名，支持`Fun-ASR`。
speech_recognition_param.channel_id	array(int)	否	需要识别的音轨索引，默认为`[0]`，若给定多个音轨，将同时识别给定的所有音轨，每个音轨独立计费。
speech_recognition_param.special_word_filter	string	否	通过JSON格式设置需要处理的敏感词，支持对不同敏感词设置不同的处理策略。若未传入该参数，将使用默认。JSON字段说明： `filter_with_signed`： - 类型：Object - 必填：否 - 描述：替换识别结果中的敏感词为相同长度的`` - filter_with_signed.word_list:字符串数组，包含所有需要被替换的敏感词。 `filter_with_empty`： - 类型：Object - 必填：否 - 描述：删除识别结果中的敏感词 - filter_with_empty.word_list:字符串数组，包含所有需要被删除的敏感词。 `system_reserved_filter`： - 类型：bool - 必填：否 - 描述：是否同时启用系统默认敏感词处理策略，匹配的敏感词将被同长度的``替换。 - 默认值：true。
speech_recognition_param.diarization_enabled	bool	否	是否启用说话人分离，默认为`false`（关闭）。
speech_recognition_param.speaker_count	int	否	说话人数量参考值，`speech_recognition_param.diarization_enabled`参数为`true`生效，范围为`[2, 100]`。默认为自动预测说话人数量，若给定参考值只能辅助预测，不能确保最终一定获得参考值人数。

方式二：音频文件上传模式（multipart/form-data）

Header参数

名称	类型	必填	描述
Content-Type	String	是	固定值`multipart/form-data`
Authorization	String	是	"Bearer " + Apikey

form-data参数（旧参数，即将废弃，请尽快使用新参数）

名称	类型	必填	描述
audio_file	文件	是	本地音频文件

form-data参数（Fun-ASR 新参数）

名称	类型	必填	描述
audio_file	文件	是	本地音频文件
speech_recognition_param.model	string	是	模型名，支持`Fun-ASR`。
speech_recognition_param.channel_id	array(int)	否	需要识别的音轨索引，默认为`[0]`，若给定多个音轨，将同时识别给定的所有音轨，每个音轨独立计费。
speech_recognition_param.special_word_filter	string	否	通过JSON格式设置需要处理的敏感词，支持对不同敏感词设置不同的处理策略。若未传入该参数，将使用默认。JSON字段说明： `filter_with_signed`： - 类型：Object - 必填：否 - 描述：替换识别结果中的敏感词为相同长度的`` - filter_with_signed.word_list:字符串数组，包含所有需要被替换的敏感词。 `filter_with_empty`： - 类型：Object - 必填：否 - 描述：删除识别结果中的敏感词 - filter_with_empty.word_list:字符串数组，包含所有需要被删除的敏感词。 `system_reserved_filter`： - 类型：bool - 必填：否 - 描述：是否同时启用系统默认敏感词处理策略，匹配的敏感词将被同长度的``替换。 - 默认值：true。
speech_recognition_param.diarization_enabled	bool	否	是否启用说话人分离，默认为`false`（关闭）。
speech_recognition_param.speaker_count	int	否	说话人数量参考值，`speech_recognition_param.diarization_enabled`参数为`true`生效，范围为`[2, 100]`。默认为自动预测说话人数量，若给定参考值只能辅助预测，不能确保最终一定获得参考值人数。

响应说明

响应头参数

名称	值	描述
Content-Type	application/json

响应体参数

名称	类型	描述
task_id	string	任务id
created	int	时间戳

请求示例

音频文件链接模式（旧参数）

curl --location --request POST 'https://www.sophnet.com/api/open-apis/projects/easyllms/speechtotext/transcriptions' \
--header "Authorization: Bearer $API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
    "audio_url":"YOUR_AUDIO_URL"
}'

音频文件链接模式（Fun-ASR 新参数）

curl --location --request POST 'https://www.sophnet.com/api/open-apis/projects/easyllms/speechtotext/transcriptions' \
--header "Authorization: Bearer $API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
    "file_urls": ["YOUR_AUDIO_URL"],
    "speech_recognition_param": {
        "model": "Fun-ASR"
    }
}'

音频文件上传模式（旧参数）

curl --location --request POST 'https://www.sophnet.com/api/open-apis/projects/easyllms/speechtotext/transcriptions' \
--header "Authorization: Bearer $API_KEY" \
--header "Content-Type: multipart/form-data" \
-F "audio_file=@/path/to/your_audio_file.wav;type=audio/wav"

音频文件上传模式（Fun-ASR 新参数）

import requests
from requests_toolbelt.multipart.encoder import MultipartEncoder
import json

API_KEY = "YOUR_API_KEY"
AUDIO_FILE_NAME = ""
AUDIO_FILE_PATH = ""
AUDIO_FILE_FORMAT = "" # 例如audio/wav

param_data = {
    "speech_recognition_param": {
        "model": "Fun-ASR"
    }
}

# 使用MultipartEncoder更精确地控制multipart/form-data
multipart_data = MultipartEncoder(
    fields={
        "audio_file": (AUDIO_FILE_NAME, open(AUDIO_FILE_PATH, "rb"), AUDIO_FILE_FORMAT),
        "data": (None, json.dumps(param_data), "application/json")
    }
)

url = "https://www.sophnet.com/api/open-apis/projects/easyllms/speechtotext/transcriptions"

headers = {
    "Authorization": f"Bearer {API_KEY}"
}

headers["Content-Type"] = multipart_data.content_type

response = requests.post(url, headers=headers, data=multipart_data)
print(response.text)

响应示例

{
    "taskId": "10047816884",
    "created": 1724652804
}

1.2 查询转写任务状态/结果

名称	类型	必填	描述
task_id	String	是	语音转文本task_id

响应说明

响应头参数

名称	值	描述
Content-Type	application/json

响应体参数（旧版格式）

名称	类型	描述
task_id	string	任务id
status	string	任务状态，waiting：任务等待，doing：任务执行中，success：任务成功，failed：任务失败。示例值：waiting
result	string	转录结果
errorMsg	string	错误码
audio_duration	float	输入音频总长度，以秒为单位。

响应体参数（Fun-ASR 新版格式）

名称	类型	描述
task_id	string	任务id
status	string	任务状态，waiting：任务等待，doing：任务执行中，success：任务成功，failed：任务失败。示例值：waiting
errorMsg	string	错误码
audio_duration	float	输入音频总长度，以秒为单位。
results	array(object)	转录结果，适用于`Fun-ASR`模型。
results[*].file_url	string	识别结果对应的音频url。
results[*].subtask_status	string	当前音频URL识别状态，支持`PENDING`、`RUNNING`、`SUCCEEDED`、`FAILED`。
results[*].code	string	转录结果代码。
results[*].message	string	转录结果消息。
results[*].transcripts	array(object)	文字、音频长度结果等。
results[].transcripts[].text	string	文字结果。
results[].transcripts[].content_duration_in_milliseconds	float	文字结果对应的音频长度。
results[].transcripts[].channel_id	int	文字结果对应的的音轨索引。

请求示例

curl --location --request GET 'https://www.sophnet.com/api/open-apis/projects/easyllms/speechtotext/transcriptions/{taskId}' \
--header "Authorization: Bearer $API_KEY" \

响应示例

{
    "taskId":"a2b1ae7a-d092-4e98-b5e8-2e58e2xxxxx",
    "status":"SUCCEEDED",
    "results":[
        {
            "transcripts":[
                {
                    "text":"欢迎大家来体验达摩院推出的语音识别模型。",
                    "content_duration_in_milliseconds":4400.0,
                    "channel_id":0
                }
            ],
            "file_url":"https://isv-data.oss-cn-hangzhou.aliyuncs.com/xx/xx/xx/test_audio/test.wav"
        }
    ],
    "audio_duration":4.4}

1.3 非流式同步语音识别

请求说明

基本信息

请求地址：https://www.sophnet.com/api/open-apis/projects/easyllms/speechtotext/non-stream

请求方式：POST

可用模型：Fun-ASR / Fun-ASR-2025-08-25 / Fun-ASR-Nano-2512 / paraformer-realtime / Qwen3-ASR-Flash-2025-09-08
Header参数

名称	类型	必填	描述
Content-Type	String	是	固定值application/json
Authorization	String	是	"Bearer" + Apikey

Body参数

名称	类型	必填	描述
audio_url	String	是	设置待识别音频路径
speech_recognition_param	Object	否	识别参数
speech_recognition_param.model	String	否	设置识别模型。仅支持内部默认模型，不可选择，后续将支持多模型。
speech_recognition_param.sample_rate	Integer	否	设置待识别音频采样率。支持任意采样率。
speech_recognition_param.format	String	否	设置待识别音频格式。支持的音频格式：wav。

响应说明

响应头参数

名称	值	描述
Content-Type	application/json

响应体参数

名称	类型	描述
result	String	模型推理结果
audio_duration	Float	音频总时长，单位为秒

请求示例

curl --location --request POST 'https://www.sophnet.com/api/open-apis/projects/easyllms/speechtotext/non-stream' \
--header "Authorization: Bearer $API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
    "audio_url": "{YOUR_AUDIO_URL}"
}'

响应示例

{
    "result":"你好，算能。",
    "audio_duration":2.471
}

1.4 流式语音识别（WebSocket）

连接请求说明

基本信息

请求地址：wss://www.sophnet.com/api/open-apis/projects/easyllms/stream-speech

请求方式：Websocket

Request参数

名称	类型	必填	描述
apikey	String	是	Apikey
format	String	是	输入音频格式，支持pcm、wav、mp3、opus、speex、aac、amr
sample_rate	int	是	音频采样率，任意音频采样率，但16k效果更好
heartbeat	bool	是	是否开启心跳，若为false即使发送静音音频也会在60s后超时关闭连接，需要在60s内包含人声音频，若为true，发送静音音频将保持连接，需要在60s内发送音频。

连接响应说明

响应体参数

名称	类型	描述
status	string	连接成功返回'ok'，失败则直接关闭连接

音频识别结果响应说明

响应体参数

名称	类型	描述
text	string	句子级识别结果，当is_sentence_end为false时，包含流式识别的输出结果，而为true则表示最终句子识别结果，下一个消息将是新的句子
begin_time	int	句子开始的时刻，单位为毫秒
end_time	int	句子结束的时刻，单位为毫秒
words	string	字级别的预测结果
is_sentence_end	bool	表示句子是否结束

连接请求示例

const url = `wss://www.sophnet.com/api/open-apis/projects/easyllms/stream-speech`
        + `?apikey=${apikey}`
        + `&format=${format}`
        + `&sample_rate=${sampleRate}`
        + `&heartbeat=true`;

ws = new WebSocket(url);
ws.binaryType = 'arraybuffer';

ws.onopen = () => {
log('WebSocket 已连接: ' + url);
};

ws.onmessage = (evt) => {
if (typeof evt.data === 'string') {
    log('<- ASR_RESULT: ' + evt.data);
} else {
    log('<- binary message (' + evt.data.byteLength + ' bytes)');
}
};

ws.onerror = (err) => {
log('WebSocket 错误: ' + err);
};

ws.onclose = (evt) => {
log(`WebSocket 已关闭: [${evt.code}] ${evt.reason}`);
};

音频数据发送请求示例

ws.send(byteData);

连接响应示例

{"status": "ok"}

音频识别结果响应示例

{"text":"这是深度神经网络的语音","begin_time":660,"end_time":null,"words": ["Word(beginTime=660, endTime=1148, text=这是, punctuation=, fixed=false)", "Word(beginTime=1148, endTime=1636, text=深度, punctuation=, fixed=false)", "Word(beginTime=1636, endTime=2124, text=神经, punctuation=, fixed=false)", "Word(beginTime=2124, endTime=2612, text=网络的, punctuation=, fixed=false)", "Word(beginTime=2612, endTime=3100, text=语音, punctuation=, fixed=false)"], "is_sentence_end": false}

{"text":"这是深度神经网络的语音识别","begin_time":660,"end_time":null,"words": ["Word(beginTime=660, endTime=1148, text=这是, punctuation=, fixed=false)", "Word(beginTime=1148, endTime=1636, text=深度, punctuation=, fixed=false)", "Word(beginTime=1636, endTime=2124, text=神经, punctuation=, fixed=false)", "Word(beginTime=2124, endTime=2612, text=网络的, punctuation=, fixed=false)", "Word(beginTime=2612, endTime=3100, text=语音, punctuation=, fixed=false)", "Word(beginTime=3100, endTime=3500, text=识别, punctuation=, fixed=false)"], "is_sentence_end": false}

{"text":"这是深度神经网络的语音识别模型。","begin_time":660,"end_time":5540,"words": ["Word(beginTime=660, endTime=1148, text=这是, punctuation=, fixed=false)", "Word(beginTime=1148, endTime=1636, text=深度, punctuation=, fixed=false)", "Word(beginTime=1636, endTime=2124, text=神经, punctuation=, fixed=false)", "Word(beginTime=2124, endTime=2612, text=网络的, punctuation=, fixed=false)", "Word(beginTime=2612, endTime=3100, text=语音, punctuation=, fixed=false)", "Word(beginTime=3100, endTime=3588, text=识别, punctuation=, fixed=false)", "Word(beginTime=3588, endTime=4076, text=模型, punctuation=, fixed=false)"], "is_sentence_end": true}

{"text":"请","begin_time":6001,"end_time":null,"words": ["Word(beginTime=6001, endTime=6502, text=请, punctuation=, fixed=false)"], "is_sentence_end": false}

名称	类型	必填	描述
text	array(string)	是	需要转语音的字符串列表
synthesis_param	object	是	转语音参数
synthesis_param.model	string	是	指定模型，默认值为"cosyvoice-v2"，支持"cosyvoice-v1"、"cosyvoice-v2"。
synthesis_param.voice	string	否	指定音色，默认值为"longjiqi"
synthesis_param.format	string	否	指定音频编码格式及采样率，格式为"文件格式_采样率_通道_比特率"，例如`MP3_16000HZ_MONO_128KBPS`代表音频格式为mp3，采样率为16kHz。若未指定format，系统将根据voice参数自动选择该音色的推荐格式。各个文件格式对应的示例：`WAV_16000HZ_MONO_16BIT`、`MP3_16000HZ_MONO_128KBPS`、`PCM_16000HZ_MONO_16BIT`，其他文件格式暂未支持。
synthesis_param.volume	number	否	指定音量，默认值为50，取值范围：[0-100]
synthesis_param.speechRate	number	否	指定语速，默认值为1，输入范围：[0.5,2]
synthesis_param.pitchRate	number	否	指定语调，默认值为1，取值范围：[0.5,2]

支持以下预置音色：longyingxiao / longjiqi / longhouge / longjixin / longanyue / longshange / longanmin / longdaiyu / longgaoseng / longanli / longanlang / longanwen / longanyun / longyumi_v2 / longxiaochun_v2 / longxiaoxia_v2 等。

响应说明

流式接口

响应头参数

名称	值	描述
Content-Type	text/event-stream

响应体参数

名称	类型	值
data	string	base64编码的语音数据

非流式接口

响应头参数

名称	值	描述
Content-Type	audio/mpeg, audio/wav, audio/L16, application/octet-stream	二进制音频流，依据具体格式返回对应类型

请求示例

curl请求示例

curl --location --request POST 'https://www.sophnet.com/api/open-apis/projects/easyllms/voice/synthesize-audio-stream' \
--header "Authorization: Bearer $API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
    "text": ["这是要合成的文本", "第二段文本"],
    "synthesis_param": {
        "model": "cosyvoice-v2",
        "voice": "longjiqi",
        "format": "WAV_16000HZ_MONO_16BIT",
        "volume": 80,
        "speechRate": 1.2,
        "pitchRate": 1.0
    }
}'

Python requests - 流式接口（输出为音频文件）

import requests
import json
import base64

API_KEY = "YOUR_API_KEY"

url = "https://www.sophnet.com/api/open-apis/projects/easyllms/voice/synthesize-audio-stream"

headers = {
   'Content-Type': 'application/json',
   'Authorization': 'Bearer ' + API_KEY,
}

payload = json.dumps({
   "text": [
       "测试",
   ],
   "synthesis_param": {
       "model": "cosyvoice-v2",
       "voice": "longjiqi",
       "format": "MP3_16000HZ_MONO_128KBPS",
       "volume": 80,
       "speechRate": 1.2,
       "pitchRate": 1
   }
})

response = requests.request("POST", url, headers=headers, data=payload)
for chunk in response.iter_lines(decode_unicode=True):
    with open("output.mp3","ab") as f:
        if chunk:
            if (frame:=json.loads(chunk[5:])["audioFrame"]):
                f.write(base64.b64decode(frame))

Python requests - 非流式接口（输出为音频文件）

import requests
import json
import base64

API_KEY = "YOUR_API_KEY"

url = "https://www.sophnet.com/api/open-apis/projects/easyllms/voice/synthesize-audio"

headers = {
   'Content-Type': 'application/json',
   'Authorization': 'Bearer ' + API_KEY,
}

payload = json.dumps({
   "text": [
       "测试",
   ],
   "synthesis_param": {
       "model": "cosyvoice-v2",
       "voice": "longjiqi",
       "format": "MP3_16000HZ_MONO_128KBPS",
       "volume": 80,
       "speechRate": 1.2,
       "pitchRate": 1
   }
})

response = requests.request("POST", url, headers=headers, data=payload)
with open("output.mp3","wb") as f:
    f.write(response.content)

响应示例

{'status': 'accepting', 'usage': None, 'audioFrame': '{BASE64 encoded data}'}

2.2 声音复刻

2.2.1 音色创建（文件上传）

请求说明

基本信息

功能描述：通过上传音频文件为某个服务的某个模型下创建音色，以便在流式/非流式文字转语音接口中使用。

请求地址：https://www.sophnet.com/api/open-apis/projects/easyllms/voice/upload

请求方式：POST

Header参数

名称	类型	必填	描述
Content-Type	String	是	固定值multipart/form-data
Authorization	String	是	"Bearer" + Apikey

Form参数

名称	类型	必填	描述
audio_file	File	是	用于创建音色的文件，建议10-20s，不得超过60s，文件不得超过10MB。支持音频格式：wav、mp3、m4a。
tts_speaker_voice_generate_req	object	是	音色创建参数。
tts_speaker_voice_generate_req.model	string	是	指定模型下去创建音色，支持的模型类型可参考：流式/非流式文字转语音。
tts_speaker_voice_generate_req.name	string	否	指定音色名称，能辨别不同音色即可。
tts_speaker_voice_generate_req.des	string	否	音色的描述。
tts_speaker_voice_generate_req.prompt_text	string	是	用于创建音色的音频文件中对应的文字内容。

限制：单个组织下最多能创建100个音色。

响应说明

响应头参数

名称	值	描述
Content-Type	application/json

响应体参数

名称	类型	值
status	int	返回状态，0为成功。
message	string	返回消息。
result	null	固定为null。
timestamp	int	时间戳。

请求示例

import requests
from requests_toolbelt.multipart.encoder import MultipartEncoder
import json

API_KEY = "YOUR_API_KEY"
AUDIO_FILE_NAME = ""
AUDIO_FILE_PATH = ""
AUDIO_FILE_FORMAT = "" # 例如audio/wav
PROMPT_TEXT = "YOUR_AUDIO_CONTENT"

voice_data = {
    "model": "cosyvoice-v2",
    "name": "voice1",
    "prompt_text": PROMPT_TEXT
}

# 使用MultipartEncoder更精确地控制multipart/form-data
multipart_data = MultipartEncoder(
    fields={
        "audio_file": (AUDIO_FILE_NAME, open(AUDIO_FILE_PATH, "rb"), AUDIO_FILE_FORMAT),
        "tts_speaker_voice_generate_req": (None, json.dumps(voice_data), "application/json")
    }
)

url = "https://www.sophnet.com/api/open-apis/projects/easyllms/voice/upload"

headers = {
    "Authorization": f"Bearer {API_KEY}"
}

headers["Content-Type"] = multipart_data.content_type

response = requests.post(url, headers=headers, data=multipart_data)
print(response)

响应示例

{
    "status": 0,
    "message": "请求成功",
    "result": null,
    "timestamp": 1765420402078
}

2.2.2 音色创建（URL上传）

请求说明

基本信息

功能描述：通过音频URL为某个服务的某个模型下创建音色，以便在流式/非流式文字转语音接口中使用。

请求地址：https://www.sophnet.com/api/open-apis/projects/easyllms/voice/upload

请求方式：POST

Header参数

名称	类型	必填	描述
Content-Type	String	是	固定值application/json
Authorization	String	是	"Bearer" + Apikey

Body参数

名称	类型	必填	描述
audio_url	string	是	用于创建音色的音频URL，建议10-20s，不得超过60s，文件不得超过10MB。支持音频格式：wav、mp3、m4a。
model	string	是	指定模型创建音色，支持的模型类型可参考：流式/非流式文字转语音。
name	string	否	指定音色名称，能辨别不同音色即可。
des	string	否	音色的描述。
prompt_text	string	是	用于创建音色的音频文件中对应的文字内容。

限制：单个组织下最多能创建100个音色。

响应说明

响应头参数

名称	值	描述
Content-Type	application/json

响应体参数

名称	类型	值
status	int	返回状态，0为成功。
message	string	返回消息。
result	null	固定为null。
timestamp	int	时间戳。

请求示例

import requests
import json

API_KEY = "YOUR_API_KEY"
PROMPT_TEXT = "通义语音实验室依托大规模预训练语言模型，深度融合文本理解和语音生成的新一代生成式语音合成大模型codvoice。支持文本制语音的实时流式合成。"

voice_data = {
    "model": "cosyvoice-v2",
    "name": "voice1",
    "prompt_text": PROMPT_TEXT,
    "audio_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/cosyvoice/cosyvoice-zeroshot-sample.wav"
}

url = "https://www.sophnet.com/api/open-apis/projects/easyllms/voice/upload"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

response = requests.post(url, headers=headers, data=json.dumps(voice_data))
print(response.json())

响应示例

{
    "status": 0,
    "message": "请求成功",
    "result": null,
    "timestamp": 1765420402078
}

2.2.3 音色查询

请求说明

基本信息

功能描述：查询某个服务的某个模型下所有可用音色，以便在流式/非流式文字转语音接口中使用。

请求地址：https://www.sophnet.com/api/open-apis/projects/easyllms/voice

请求方式：GET

Header参数

名称	类型	必填	描述
Authorization	String	是	"Bearer" + Apikey

响应说明

响应头参数

名称	值	描述
Content-Type	application/json

响应体参数

名称	类型	值
status	int	返回状态，0为成功。
message	string	返回消息。
result	list	音色列表。
result[x].model	string	音色所属模型。
result[x].des	string	音色描述。
result[x].name	string	音色名称。
result[x].voice_id	string	音色的唯一标识，用于TTS请求。
result[x].bucket_name	string	总是null。
result[x].file_name	string	总是null。
result[x].last_used_at	string	音色上次使用的时间。
timestamp	int	时间戳。

请求示例

import requests

API_KEY = "YOUR_API_KEY"

url = "https://www.sophnet.com/api/open-apis/projects/easyllms/voice"

headers = {
   'Authorization': f'Bearer {API_KEY}'
}

response = requests.request("GET", url, headers=headers)

print(response.text)

响应示例

{
    "status": 0,
    "message": "请求成功",
    "result": [
        {
            "model": "xx",
            "des": null,
            "name": "xx",
            "voice_id": "xx",
            "bucket_name": null,
            "file_name": null,
            "last_used_at": "xx"
        }
    ],
    "timestamp": 1765423576409
}

2.2.4 音色删除

请求说明

基本信息

功能描述：删除某个服务的某个模型下的某个可用音色，调用后音色不能在流式/非流式文字转语音接口中继续使用。

请求地址：https://www.sophnet.com/api/open-apis/projects/easyllms/voice/remove

请求方式：DELETE

Header参数

名称	类型	必填	描述
Authorization	String	是	"Bearer" + Apikey
Content-Type	String	是	application/json

Body参数

名称	类型	必填	描述
model	string	是	待删除音色所属的模型名。
voice_id	string	是	待删除音色的唯一标识。

响应说明

响应头参数

名称	值	描述
Content-Type	application/json

响应体参数

名称	类型	值
status	int	返回状态，0为成功。
message	string	返回消息。
result	null	固定为null。
timestamp	int	时间戳。

请求示例

import requests
import json

API_KEY = "YOUR_API_KEY"
VOICE_ID = "YOUR_VOICE"
MODEL = "YOUR_MODEL_NAME"

url = "https://www.sophnet.com/api/open-apis/projects/easyllms/voice/remove"

payload = json.dumps({
   "model": MODEL,
   "voice_id": VOICE_ID
})
headers = {
   'Content-Type': 'application/json',
   'Authorization': f'Bearer {API_KEY}'
}

response = requests.request("DELETE", url, headers=headers, data=payload)

print(response.text)

响应示例

{
    "status": 0,
    "message": "请求成功",
    "result": null,
    "timestamp": 1765423576409
}

三、组合能力

3.1 Chat Completions + 语音合成

请求说明

基本信息

请求地址：https://www.sophnet.com/api/open-apis/v1/chat/completions-with-voice-output

请求方式：POST

仅支持流式

可用模型（TTS部分）：cosyvoice-v1 / cosyvoice-v2
Header参数

名称	类型	必填	描述
Content-Type	String	是	固定值application/json
Authorization	String	是	"Bearer" + Apikey

Body参数

名称	类型	必填	描述
chat_completion_req	dict	是	包含Chat completions的参数
speech_synthesis_req	dict	是	包含语音合成的部分参数

chat_completion_req参数：参考 Chat completions 接口（API.md），仅支持流式
speech_synthesis_req参数：仅支持流式

名称	类型	必填	描述
synthesis_param	dict	否	转语音参数
synthesis_param.model	string	否	指定模型，默认值为"cosyvoice-v2"
synthesis_param.voice	string	否	指定音色，默认值为"longjiqi"，支持longyingxiao/longjiqi/longhouge/longjixin/longanyue/longshange/longanmin/longdaiyu/longgaoseng/longanli/longanlang/longanwen/longanyun/longyumi_v2/longxiaochun_v2/longxiaoxia_v2等
synthesis_param.format	string	否	指定音频编码格式及采样率，格式为"文件格式_采样率_通道_比特率"，例如`MP3_16000HZ_MONO_128KBPS`代表音频格式为mp3，采样率为16kHz。若未指定format，系统将根据voice参数自动选择该音色的推荐格式。
synthesis_param.volume	number	否	指定音量，默认值为50，取值范围：[0-100]
synthesis_param.speechRate	number	否	指定语速，默认值为1，输入范围：[0.5,2]
synthesis_param.pitchRate	number	否	指定语调，默认值为1，取值范围：[0.5,2]
，分别是Chat completions响应消息和语音合成响应消息，LLM 文本与 TTS 音频交错返回。

Chat completions响应说明：参考 Chat completions 接口（API.md）
语音合成响应说明

名称	类型	值
audioFrame	dict	语音数据结果
audioFrame.array	String	Base64编码的音频数据
status	String	目前服务状态

请求示例

curl --location --request POST 'https://www.sophnet.com/api/open-apis/v1/chat/completions-with-voice-output' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $API_KEY' \
--data-raw '{
    "chat_completion_req": {
    "messages": [
        {
            "role": "user",
            "content": "你好"
        }
    ],
    "model":"Qwen2.5-32B-Instruct",
    "stream": true
    },
    "speech_synthesis_req": {}
}'

响应示例

{"choices": [{"delta": {"content": "", "role": "assistant"},"index": 0}],"created": 1749037853,"id": "chatcmpl-xxx","model": "Qwen2.5-32B-Instruct","object": "chat.completion.chunk"}

{"choices":[{"delta":{"content":"你好"},"index":0}],"created":1749037853,"id":"chatcmpl-xxx","model":"Qwen2.5-32B-Instruct","object":"chat.completion.chunk"}

{"choices":[{"delta":{"content":"！"},"index":0}],"created":1749037853,"id":"chatcmpl-xxx","model":"Qwen2.5-32B-Instruct","object":"chat.completion.chunk"}

{"choices":[{"delta":{"content":"有什么"},"index":0}],"created":1749037853,"id":"chatcmpl-xxx","model":"Qwen2.5-32B-Instruct","object":"chat.completion.chunk"}

{"status":"accepting","usage":null,"audioFrame":"SUQzBAAA..."}

{"choices":[{"delta":{"content":"可以帮助你的吗？"},"index":0}],"created":1749037853,"id":"chatcmpl-xxx","model":"Qwen2.5-32B-Instruct","object":"chat.completion.chunk"}

{"status":"accepting","usage":null,"audioFrame":null}

{"status":"accepting","usage":null,"audioFrame":"//PCxO1..."}

{"choices":[{"delta":{"content":""},"finish_reason":"stop","index":0}],"created":1749037853,"id":"chatcmpl-xxx","model":"Qwen2.5-32B-Instruct","object":"chat.completion.chunk"}

{"status":"accepting","usage":null,"audioFrame":"//PAxPJh..."}

{"status":"finish","usage":{"characters":26},"audioFrame":null}

...

3.2 语音对话（WebSocket 全双工）

上行事件

连接请求事件说明

基本信息

请求地址：wss://www.sophnet.com/api/open-apis/projects/easyllms/chat/speech-chat

请求方式：Websocket

Request参数

名称	类型	必填	描述
token	String	是	包含"Bearer "前缀，后跟Apikey
model	String	是	Chat completions服务的模型名

对话配置更新事件说明（chat.update）

事件类型：chat.update
事件说明：该事件发生在获得连接请求响应后，可选地更新对话配置，在执行流式上传音频片段前，可更新多次，之后将不能更新。若不更新则使用默认参数。仅支持流式
参数说明：字段create_transcription_task_req、chat_completion_with_voice_output_req.chat_completion_req和chat_completion_with_voice_output_req.speech_synthesis_req必须设置，其内容为空则所有字段使用默认参数，若部分字段为空则未设置的字段将使用默认值。
默认参数：

{
    "create_transcription_task_req": {
        "heartbeat": false,
        "speech_recognition_param": {
            "sample_rate": 16000,
            "format": "wav"
        }
    },
    "chat_completion_with_voice_output_req": {
        "chat_completion_req": {
            "messages": [],
            "model": "${model参数}",
            "stream": true
        },
        "speech_synthesis_req": {
            "stream": true,
            "synthesis_param": {
                "model": "cosyvoice-v2",
                "voice": "longjiqi",
                "format": "MP3_22050HZ_MONO_256KBPS"
            }
        }
    },
    "asr_mode": "online"
}

事件消息结构：

参数	类型	必填	说明
event_type	String	是	事件类型，固定为chat.update
message	String	是	配置参数JSON字符串，格式{"create_transcription_task_req": {可参考流式语音识别中Request参数说明}, "chat_completion_with_voice_output_req": {可参考Chat Completions + 语音合成中Body参数说明}, "asr_mode": asr加载模式}
message.asr_mode	String	否	目前支持三种："online"/"refresh"/"dynamic"，默认为"online"模式。"refresh"模式指每次执行llm+tts时清空asr音频缓存，并继续监听，对于输入截断的音频建议开启，"online"模式指总是开启ASR，对于连续音频流建议开启，"dynamic"模式指动态开启ASR，在LLM+TTS推理开始时会关闭，在下一次传递bytes音频数据的时候会自动开启。

流式上传音频片段

事件说明：该事件发生后对话配置不允许再被更新，并将根据heartbeat设置的参数判断超时，超时将关闭连接。
事件消息结构：二进制音频数据块，可按照100ms、200ms传输，根据实际情况调整。

音频提交事件说明（input_audio_buffer.complete）

事件类型：input_audio_buffer.complete
事件说明：该事件发生后将停止ASR，并将ASR识别结果作为LLM输入，转为LLM+TTS推理。
事件消息结构：

参数	类型	必填	说明
event_type	String	是	事件类型，固定为input_audio_buffer.complete

音频缓存清理事件说明（input_audio_buffer.clear）

事件类型：input_audio_buffer.clear
事件说明：该事件发生后将清空ASR服务的音频缓存bytes数据（可能会发送一条ASR识别结果），并清空已识别结果。若在未发送过音频bytes数据情况下执行该事件，将报错。
事件消息结构：

参数	类型	必填	说明
event_type	String	是	事件类型，固定为input_audio_buffer.clear

上下文清理事件说明（conversation.clear）

事件类型：conversation.clear
事件说明：该事件发生后将清理之前的LLM上下文记录，但对话配置更新事件设置的上下文不会被清理。
事件消息结构：

参数	类型	必填	说明
event_type	String	是	事件类型，固定为conversation.clear

打断事件说明（conversation.chat.cancel）

事件类型：conversation.chat.cancel
事件说明：该事件发生后将中断LLM+TTS推理，并转为ASR。
事件消息结构：

参数	类型	必填	说明
event_type	String	是	事件类型，固定为conversation.chat.cancel

心跳事件说明（ping）

事件类型：ping
事件说明：需要定时发送该消息，如果超过60s，将关闭连接。
事件消息结构：

参数	类型	必填	说明
event_type	String	是	事件类型，固定为ping

下行事件

连接请求事件响应（chat.created）

事件类型：chat.created
事件说明：需要等连接响应返回后才能执行其他事件。
响应体参数

参数	类型	说明
event_type	String	事件类型，固定为chat.created
status	int	事件状态，0表示成功，非0表示失败
message	String	失败原因说明

对话配置更新事件响应（chat.updated）

事件类型：chat.updated
响应体参数

参数	类型	说明
event_type	String	事件类型，固定为chat.updated
status	int	事件状态，0表示成功，非0表示失败
message	String	失败原因说明

增量音频识别结果响应

事件说明：流式返回音频识别结果。
响应体参数：参考流式语音识别（WebSocket）中返回响应

音频提交事件响应（input_audio_buffer.completed）

事件类型：input_audio_buffer.completed
响应体参数

参数	类型	说明
event_type	String	事件类型，固定为input_audio_buffer.completed
status	int	事件状态，0表示成功，非0表示失败
message	String	失败原因说明

音频缓存清空事件响应（input_audio_buffer.cleared）

事件类型：input_audio_buffer.cleared
响应体参数

参数	类型	说明
event_type	String	事件类型，固定为input_audio_buffer.cleared
status	int	事件状态，0表示成功，非0表示失败
message	String	失败原因说明

上下文清理事件响应（conversation.cleared）

事件类型：conversation.cleared
响应体参数

参数	类型	说明
event_type	String	事件类型，固定为conversation.cleared
status	int	事件状态，0表示成功，非0表示失败
message	String	失败原因说明

打断事件响应（conversation.chat.canceled）

事件类型：conversation.chat.canceled
响应体参数

参数	类型	说明
event_type	String	事件类型，固定为conversation.chat.canceled
status	int	事件状态，0表示成功，非0表示失败
message	String	失败原因说明

增量LLM+TTS推理结果响应

事件说明：流式返回LLM和TTS推理结果。
响应体参数：参考Chat Completions + 语音合成中返回响应

当次对话完成响应（conversation.chat.completed）

事件类型：conversation.chat.completed
事件说明：在LLM+TTS正常推理结束、发生错误或被打断，则该消息会被返回，将开启ASR。
响应体参数

参数	类型	说明
event_type	String	事件类型，固定为conversation.chat.completed
status	int	固定为0
message	String	固定为空

心跳事件响应（pong）

事件类型：pong
响应体参数

参数	类型	说明
event_type	String	事件类型，固定为pong
status	int	事件状态，0表示成功，非0表示失败
message	String	失败原因说明

错误响应（error）

事件类型：error
事件说明：其他错误，例如请求参数或运行时错误。
响应体参数

参数	类型	说明
event_type	String	事件类型，固定为error
status	int	非0
message	String	错误原因说明

请求/响应示例

连接请求示例

连接请求

const url = `wss://www.sophnet.com/api/open-apis/projects/easyllms/chat/speech-chat`
              + `?model=${model}`
              + `&token=Bearer ${apikey}`;

ws = new WebSocket(url);

ws.onopen = () => {
    log('WebSocket 已连接: ' + url);
};

ws.onmessage = (evt) => {
    log('<- RESULT: ' + evt.data);
};

ws.onerror = (err) => {
log('WebSocket 错误: ' + err);
};

ws.onclose = (evt) => {
log(`WebSocket 已关闭: [${evt.code}] ${evt.reason}`);
};

连接响应

{"status":0,"message":"","event_type":"chat.created"}

对话配置更新示例

对话配置更新请求

ws.send('{"event_type":"chat.update","message":"{\\"create_transcription_task_req\\":{\\"service_uuid\\":\\"\\"},\\"chat_completion_with_voice_output_req\\":{\\"chat_completion_req\\":{\\"messages\\":[{\\"role\\":\\"system\\",\\"content\\":\\"你是人工智能助手。\\"}],\\"model\\":\\"\\",\\"stream\\":true},\\"speech_synthesis_req\\":{\\"service_id\\":\\"\\"}}}"}');

对话配置更新响应

{"status":0,"message":"","event_type":"chat.updated"}

流式上传音频片段示例

ws.send(byteData);

ASR结果、LLM结果、TTS结果响应示例

可分别参考流式语音识别（WebSocket）和Chat Completions + 语音合成中的返回示例。