普通模型

Chat completions

请求说明

基本信息

请求地址：https://www.sophnet.com/api/open-apis/v1/chat/completions

请求方式：POST

Header参数

名称	类型	必填	描述
Content-Type	String	是	固定值application/json
Authorization	String	是	"Bearer" + Apikey

Body参数

名称	类型	必填	描述
messages	array(message)	是	聊天上下文信息。支持Qwen VL系列模型。纯文本示例：messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Knock knock."},{"role": "assistant", "content": "Who's there?"},{"role": "user", "content": "Orange."},] 多模态示例：{"messages":[{"role":"user","content":[{"type":"text","text":"describe the image in 100 words or less"},{"type":"image_url","image_url":{"url":"xxx","detail":"high"}}]}]}}
model	string	是	模型
tools	array	否	工具列表，只支持function
stream	bool	否	是否以流式接口的形式返回数据，默认false
max_tokens	integer	否	模型回复最大长度（单位 token）
temperature	number	否	较高的数值会使输出更加随机，而较低的数值会使其更加集中。默认值1.0，取值范围[0,2.0]。
top_p	number	否	影响输出文本的多样性，取值越大，生成文本的多样性越强。默认值1.0。
stop	array(string)	否	停止生成更多Tokens的最多4个字符串。
presence_penalty	number	否	通过对已生成的token增加惩罚，减少重复生成的现象。默认值0，取值范围：[-2.0, 2.0]。
frequency_penalty	number	否	根据新词在当前文本中的频率进行惩罚，降低模型逐字重复同一行的可能性。默认值0，取值范围：[-2.0, 2.0]。
logprobs	boolean	否	默认值false。是否返回输出 tokens 的对数概率。
top_logprobs	integer	否	默认值0，取值范围为 [0, 20]。指定每个输出 token 位置最有可能返回的 token 数量，每个 token 都有关联的对数概率。仅当 logprobs为true 时可以设置 top_logprobs 参数。
response_format	object	否	指定模型必须输出的格式的对象。默认值： { "type": "text" } 设置为 { "type": "json_object" } 可启用 JSON 模式，这保证模型生成的消息是有效的 JSON。重要：使用 JSON 模式时，您还必须通过系统或用户消息提示模型自行生成JSON。
chat_template_kwargs	object	否	聊天模板参数对象
chat_template_kwargs.enable_thinking	bool	否	是否开启思考模式
thinking	object	否	思考配置对象，用于兼容不同模型的思考模式
thinking.budget_tokens	integer	否	思考预算token数，用于限制思考过程的token消耗
enable_thinking	bool	否	是否开启思考模式，此为兼容接口，效果同chat_template_kwargs.enable_thinking
max_completion_tokens	integer	否	模型回复的最大完成token数，与max_tokens类似但更精确。当同时设置max_tokens和max_completion_tokens时，只能使用其中一个
tool_choice	object/string	否	工具选择策略。可以是"auto"（自动选择）、"none"（不使用工具）、"required"（必须使用工具）或指定特定工具 {"type": "function", "function": {"name": "tool_name"}}
parallel_tool_calls	boolean	否	是否允许并行调用多个工具，默认为true
stop_token_ids	array(integer)	否	停止token ID列表，用于指定停止生成的token ID
reasoning	object	否	推理配置对象，用于控制模型的推理行为。包含enabled字段（boolean）
reasoning_effort	string	否	推理努力程度，可选值："low"、"medium"、"high"

响应说明

响应头参数

名称	值	描述
Content-Type	流式：text/event-stream非流式：application/json

响应体参数

a.非流式

名称	类型	描述
object	string	回包类型 chat.completion.chunk：多轮对话返回
created	int	时间戳
model	string	模型示例值：Qwen/Qwen2.5-72B-Instruct
choices	array
choices[0].index	int	索引
choices[0].finish_reason	string	结束原因正常结束：stop，token超长截断结束：length
choices[0].message	object	模型回答
choices[0].message.tool_calls	array	工具列表
choices[0].message.tool_calls[0].function	object	函数调用信息
choices[0].refs	array	引用列表，调用自定义模型且模型输出包含文档引用时存在。在非流式调用中，会在最终结果内输出此次回答包含的所有引用来源信息。
choices[0].refs[0].index	int	引用来源出现顺序
choices[0].refs[0].title	string	引用数据标题
choices[0].refs[0].content	string	引用数据内容
choices[0].refs[0].type	string	引用数据类型，file/qa/web，分别代表文件知识， Q&A Table和web搜索
choices[0].refs[0].url	string	引用数据url，其中，file和qa数据url访问需配置apikey，web数据url访问无需配置apikey

b.流式

名称	类型	描述
object	string	回包类型 chat.completion.chunk：多轮对话返回
created	int	时间戳
model	string	模型示例值：Qwen/Qwen2.5-72B-Instruct
choices	array
choices[0].index	int	索引
choices[0].finish_reason	string	结束原因正常结束：stop，token超长截断结束：length
choices[0].delta	object	模型回答
choices[0].refs	array	引用列表，调用自定义模型且模型输出包含文档引用时存在。在流式响应过程中，会实时于存在引用的位置输出引用来源信息，并在finish_reason不为空时输出此次回答包含的所有引用来源信息。
choices[0].refs[0].index	int	引用来源出现顺序
choices[0].refs[0].title	string	引用数据标题
choices[0].refs[0].content	string	引用数据内容
choices[0].refs[0].type	string	引用数据类型，file/qa/web，分别代表文件知识， Q&A Table和web搜索
choices[0].refs[0].url	string	引用数据url，其中，file和qa数据url访问需配置apikey，web数据url访问无需配置apikey

请求示例

示例如下，请将参数示例值替换为实际值。

纯文本请求示例

curl --location -g --request POST 'https://www.sophnet.com/api/open-apis/v1/chat/completions' \
--header "Authorization: Bearer $API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
    "messages": [
          {
            "role": "system",
            "content": "你是SophNet的智能助手"
        },
        {
            "role": "user",
            "content": "你可以帮我做什么"
        }
    ],
    "model":"Qwen2.5-72B-Instruct"
}'

Python SDK

# 支持兼容OpenAI Python SDK  终端运行：pip install OpenAI
from openai import OpenAI

### 初始化客户端
client = OpenAI(
    api_key= "API_KEY",
    base_url= "https://www.sophnet.com/api/open-apis/v1"
)
### 调用接口
response = client.chat.completions.create(
    model="Qwen2.5-72B-Instruct",
    messages=[
        {"role": "system", "content": "你是SophNet智能助手"},
        {"role": "user", "content": "你可以帮我做些什么"},
    ]
)
# 打印结果
print(response.choices[0].message.content)

Function Call请求示例

HTTP API

curl --location -g --request POST 'https://www.sophnet.com/api/open-apis/v1/chat/completions' \
--header "Authorization: Bearer $API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
    "messages": [
        {
            "role": "user",
            "content": "今日上海天气如何？"
        }
    ],
    "model":"DeepSeek-v3",
    "tools": [
        {
            "type": "function",
            "function":
            {
                "name": "get_weather",
                "description": "Get current temperature for provided coordinates in celsius.",
                "parameters":
                {
                    "type": "object",
                    "properties":
                    {
                        "latitude": {"type": "number"},
                        "longitude": {"type": "number"}
                    },
                    "required": ["latitude", "longitude"],
                    "additionalProperties": false
                },
                "strict": true
            }
        }
    ]
}'

# 请求成功后，从返回值的choices[0].message.tool_calls[0].function获取到函数调用信息
# 其中function.name是函数名，function.arguments中含有函数参数
# 假设已通过函数调用获取到返回值是20，且获取到choices[0].message.tool_calls[0].id = "call_f0j0i4meawn7kqx335d4fsj1"
# 接下来是第二次请求，其中messages列表的第一个与之前相同，第二个为choices[0].message.tool_calls，第三个的构造信息参考如下

curl --location -g --request POST 'https://www.sophnet.com/api/open-apis/v1/chat/completions' \
--header "Authorization: Bearer $API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
    "messages": [
        {
            "role": "user",
            "content": "今日上海天气如何？"
        },
        {
            "content": "",
            "role": "assistant",
            "tool_calls": [
                {
                    "id": "call_f0j0i4meawn7kqx335d4fsj1",
                    "type": "function",
                    "function":
                    {
                        "name": "get_weather",
                        "arguments": "{\"latitude\":31.2304,\"longitude\":121.4737}"
                    }
                }
            ]
        },
        {
            "role": "tool",
            "tool_call_id": "call_f0j0i4meawn7kqx335d4fsj1",
            "content": "20"
        }
    ],
    "model":"DeepSeek-v3",
    "tools": [
        {
            "type": "function",
            "function":
            {
                "name": "get_weather",
                "description": "Get current temperature for provided coordinates in celsius.",
                "parameters":
                {
                    "type": "object",
                    "properties":
                    {
                        "latitude": {"type": "number"},
                        "longitude": {"type": "number"}
                    },
                    "required": ["latitude", "longitude"],
                    "additionalProperties": false
                },
                "strict": true
            }
        }
    ]
}'

多模态请求示例(Qwen VL模型支持多模态参数请求)

HTTP API

curl --location -g --request POST 'https://www.sophnet.com/api/open-apis/v1/chat/completions' \
--header "Authorization: Bearer $API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
    "messages": [{
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "describe the image in 100 words or less"
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": "xxx"
                }
            }
        ]
    }],
    "model":"Qwen2.5-VL-72B-Instruct",
    "stream":false
}'

Python SDK

# 支持兼容OpenAI Python SDK  终端运行：pip install OpenAI
from openai import OpenAI

### 初始化客户端
client = OpenAI(
    api_key= "API_KEY",
    base_url= "https://www.sophnet.com/api/open-apis/v1"
)
### 调用接口
response = client.chat.completions.create(
    model="Qwen2.5-VL-72B-Instruct",
    messages=[
        {"role": "system", "content": "你是SophNet智能助手"},
        {
            "role": "user",
            "content": [
                {
                    "type":"text",
                    "text":"描述一下这张图片"
                },
                {
                    "type":"image_url",
                    "image_url":{"url":xxx}
                }]
        },
    ]
)
# 打印结果
print(response.choices[0].message.content)

响应示例

流式（event-stream）

data:{"object":"chat.completion.chunk","created":1724651635,"model":"Qwen/Qwen2.5-72B-Instruct","choices":[{"index":0,"delta":{"content":"我可以","role":"assistant"},"finish_reason":null}]}

data:{"object":"chat.completion.chunk","created":1724651635,"model":"Qwen/Qwen2.5-72B-Instruct","choices":[{"index":0,"delta":{"content":"提供","role":null},"finish_reason":null}]}

data:{"object":"chat.completion.chunk","created":1724651635,"model":"Qwen/Qwen2.5-72B-Instruct","choices":[{"index":0,"delta":{"content":"智能问答","role":null},"finish_reason":null}]}

data:{"object":"chat.completion.chunk","created":1724651635,"model":"Qwen/Qwen2.5-72B-Instruct","choices":[{"index":0,"delta":{"content":"和帮助。","role":null},"finish_reason":null}]}

data:{"object":"chat.completion.chunk","created":1724651635,"model":"Qwen/Qwen2.5-72B-Instruct","choices":[{"index":0,"delta":{"content":null,"role":null},"finish_reason":"stop"}]}

非流式 (Json)

{
    "object": "chat.completion",
    "created": 1724652804,
    "model": "Qwen/Qwen2.5-72B-Instruct",
    "choices": [
        {
            "index": 0,
            "message": {
                "content": "作为SophNet智能助手，我可以帮助你完成多种任务。如果你有具体的需求或问题，请告诉我！",
                "role": "assistant"
            },
            "finish_reason": "stop"
        }
    ]
}

Function Call (Json) 首次返回

{
    "object":"chat.completion",
    "created":1744967746,
    "model":"DeepSeek-v3",
    "choices":[
        {
            "index":0,
            "message":
            {
                "content":"",
                "role":"assistant",
                "tool_calls":[
                    {
                        "id":"call_f0j0i4meawn7kqx335d4fsj1",
                        "type":"function",
                        "function":
                            {
                                "name":"get_weather",
                                "arguments":"{\"latitude\":31.2304,\"longitude\":121.4737}"
                            }
                    }
                ]
            },
            "finish_reason":"tool_calls"
        }
    ]
}

第二次返回

{
    "object":"chat.completion",
    "created":1744967193,
    "model":"DeepSeek-v3",
    "choices":[
        {
            "index":0,
            "message":
                {
                    "content":"今日上海的天气温度为20°C。",
                    "role":"assistant"
                },
            "finish_reason":"stop",
        }
    ]
}

Anthropic Messages

请求说明

基本信息

请求地址：

标准地址：https://www.sophnet.com/api/open-apis/anthropic/v1/messages
资源地址：https://www.sophnet.com/api/open-apis/{logic_resource_uuid}/anthropic/v1/messages

说明：资源地址用于指定特定的逻辑资源UUID。

请求方式：POST

Header参数

名称	类型	必填	描述
Content-Type	String	是	固定值application/json
Authorization	String	是	"Bearer " + Apikey

Body参数

名称	类型	必填	描述
model	string	是	模型名称，如DeepSeek-V3.1-Fast等
messages	array(message)	是	消息列表。示例：[{"role": "user", "content": "Hello"}] 支持多模态输入：[{"role": "user", "content": [{"type": "text", "text": "描述图片"}, {"type": "image", "source": {"type": "base64", "media_type": "image/jpeg", "data": "..."}}]}]
max_tokens	integer	是	模型回复的最大token数
system	string	否	系统提示词，用于设置助手的角色和行为
temperature	number	否	采样温度，取值范围[0, 1]。较高的值会使输出更随机，较低的值会使其更集中和确定。默认值1.0
top_p	number	否	核采样参数，取值范围[0, 1]。默认值根据模型而定
top_k	integer	否	只从每个后续token的前K个选项中采样。默认值根据模型而定
stop_sequences	array(string)	否	自定义停止序列，最多支持4个
stream	boolean	否	是否以流式接口的形式返回数据，默认false
tools	array	否	工具列表，用于函数调用
tool_choice	object	否	工具选择策略，可设置为auto、any或指定特定工具
metadata	object	否	元数据，包含user_id等信息

响应说明

响应头参数

名称	值	描述
Content-Type	流式：text/event-stream 非流式：application/json

响应体参数

a.非流式

名称	类型	描述
id	string	消息的唯一标识符
type	string	对象类型，固定为"message"
role	string	角色，固定为"assistant"
content	array	内容数组
content[0].type	string	内容类型，可以是"text"或"tool_use"
content[0].text	string	文本内容（当type为text时）
content[0].id	string	工具使用ID（当type为tool_use时）
content[0].name	string	工具名称（当type为tool_use时）
content[0].input	object	工具输入参数（当type为tool_use时）
model	string	使用的模型名称示例值：DeepSeek-V3.1-Fast
stop_reason	string	停止原因，可能的值：end_turn、max_tokens、stop_sequence、tool_use
stop_sequence	string	触发停止的序列（如果适用）
usage	object	Token使用情况
usage.input_tokens	integer	输入token数量
usage.output_tokens	integer	输出token数量

b.流式

名称	类型	描述
type	string	事件类型，可能的值：message_start、content_block_start、content_block_delta、content_block_stop、message_delta、message_stop、ping
message	object	消息对象（在message_start事件中）
index	integer	内容块索引（在content_block事件中）
content_block	object	内容块对象（在content_block_start事件中）
delta	object	增量数据（在delta事件中）
delta.type	string	增量类型，如"text_delta"或"input_json_delta"
delta.text	string	增量文本内容
usage	object	Token使用情况（在message_delta事件中）

请求示例

示例如下，请将参数示例值替换为实际值。

纯文本请求示例

HTTP API

curl --location -g --request POST 'https://www.sophnet.com/api/open-apis/anthropic/v1/messages' \
--header "Authorization: Bearer $API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
    "model": "DeepSeek-V3.1-Fast",
    "max_tokens": 1024,
    "messages": [
        {
            "role": "user",
            "content": "你好，请介绍一下你自己"
        }
    ]
}'

Python SDK

# 支持兼容Anthropic Python SDK  终端运行：pip install anthropic
from anthropic import Anthropic

### 初始化客户端
client = Anthropic(
    api_key="API_KEY",
    base_url="https://www.sophnet.com/api/open-apis/anthropic"
)

### 调用接口
message = client.messages.create(
    model="DeepSeek-V3.1-Fast",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "你好，请介绍一下你自己"}
    ]
)

# 打印结果
print(message.content[0].text)

多模态请求示例

HTTP API

curl --location -g --request POST 'https://www.sophnet.com/api/open-apis/anthropic/v1/messages' \
--header "Authorization: Bearer $API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
    "model": "DeepSeek-V3.1",
    "max_tokens": 1024,
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "请描述这张图片"
                },
                {
                    "type": "image",
                    "source": {
                        "type": "url",
                        "url": "https://example.com/image.jpg"
                    }
                }
            ]
        }
    ]
}'

Python SDK

from anthropic import Anthropic

client = Anthropic(
    api_key="API_KEY",
    base_url="https://www.sophnet.com/api/open-apis/anthropic"
)

message = client.messages.create(
    model="DeepSeek-V3.1-Fast",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "请描述这张图片"
                },
                {
                    "type": "image",
                    "source": {
                        "type": "url",
                        "url": "https://example.com/image.jpg"
                    }
                }
            ]
        }
    ]
)

print(message.content[0].text)

Tool Use（函数调用）请求示例

HTTP API

curl --location -g --request POST 'https://www.sophnet.com/api/open-apis/anthropic/v1/messages' \
--header "Authorization: Bearer $API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
    "model": "DeepSeek-V3.1-Fast",
    "max_tokens": 1024,
    "tools": [
        {
            "name": "get_weather",
            "description": "获取指定城市的天气信息",
            "input_schema": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "城市名称"
                    }
                },
                "required": ["city"]
            }
        }
    ],
    "messages": [
        {
            "role": "user",
            "content": "上海今天天气怎么样？"
        }
    ]
}'

响应示例

非流式 (JSON)

{
    "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
    "type": "message",
    "role": "assistant",
    "content": [
        {
            "type": "text",
            "text": "你好！很高兴认识你！我是DeepSeek，由深度求索公司创造的AI助手。"
        }
    ],
    "model": "DeepSeek-V3.1-Fast",
    "stop_reason": "end_turn",
    "stop_sequence": null,
    "usage": {
        "input_tokens": 12,
        "output_tokens": 58
    }
}

流式 (event-stream)

event: message_start
data: {"type":"message_start","message":{"id":"msg_01XFDUDYJgAACzvnptvVoYEL","type":"message","role":"assistant","content":[],"model":"DeepSeek-V3.1-Fast","stop_reason":null,"stop_sequence":null,"usage":{"input_tokens":12,"output_tokens":1}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"你好"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"！"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"我是"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"DeepSeek"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"output_tokens":58}}

event: message_stop
data: {"type":"message_stop"}

Tool Use响应示例

{
    "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
    "type": "message",
    "role": "assistant",
    "content": [
        {
            "type": "text",
            "text": "好的，让我查询一下上海的天气信息。"
        },
        {
            "type": "tool_use",
            "id": "toolu_01A09q90qw90lq917835lq9",
            "name": "get_weather",
            "input": {
                "city": "上海"
            }
        }
    ],
    "model": "DeepSeek-V3.1-Fast",
    "stop_reason": "tool_use",
    "stop_sequence": null,
    "usage": {
        "input_tokens": 345,
        "output_tokens": 89
    }
}

Speech to text

创建Task

请求说明

基本信息

请求地址：https://www.sophnet.com/api/open-apis/projects/easyllms/speechtotext/transcriptions

请求方式：POST

支持两种请求方式
音频文件链接模式Header参数

名称	类型	必填	描述
Content-Type	String	是	固定值`application/json`
Authorization	String	是	"Bearer " + Apikey

音频文件链接模式Body参数（异步ASR接口迎来重大升级🚀！下面参数即将废弃，请尽快使用新参数）

名称	类型	必填	描述
audio_url	string	是	语音url路径支持音频格式：wav、mp3、m4a、flv、mp4、wma、3gp、amr、aac、ogg-opus、flac 音频限制：音频 URL 时长不能大于5小时，文件大小不超过1GB 识别有效时间：识别结果在服务端保存24小时

音频文件链接模式Body参数（异步ASR接口迎来重大升级🚀！下面参数为新参数）

名称	类型	必填	描述
file_urls	array(string)	是	语音url列表。音频文件有如下限制： - 音频数量限制：单次请求最多支持100个URL。 - 音频格式限制：支持`aac、amr、avi、flac、flv、m4a、mkv、mov、mp3、mp4、mpeg、ogg、opus、wav、webm、wma、wmv`。 - 音频采样率：任意采样率。 - 音频文件大小和时长限制：音频文件不超过32MB，时长在12小时以内。 - 音频语言限制：支持中文、英文、日语。
speech_recognition_param.model	string	是	模型名，支持`Fun-ASR`。
speech_recognition_param.channel_id	array(int)	否	需要识别的音轨索引，默认为`[0]`，若给定多个音轨，将同时识别给定的所有音轨，每个音轨独立计费。
speech_recognition_param.special_word_filter	string	否	通过JSON格式设置需要处理的敏感词，支持对不同敏感词设置不同的处理策略。若未传入该参数，将使用默认。JSON字段说明： `filter_with_signed`： - 类型：Object - 必填：否 - 描述：替换识别结果中的敏感词为相同长度的`` - filter_with_signed.word_list:字符串数组，包含所有需要被替换的敏感词。 `filter_with_empty`： - 类型：Object - 必填：否 - 描述：删除识别结果中的敏感词 - filter_with_empty.word_list:字符串数组，包含所有需要被删除的敏感词。 `system_reserved_filter`： - 类型：bool - 必填：否 - 描述：是否同时启用系统默认敏感词处理策略，匹配的敏感词将被同长度的``替换。 - 默认值：true。
speech_recognition_param.diarization_enabled	bool	否	是否启用说话人分离，默认为`false`（关闭）。
speech_recognition_param.speaker_count	int	否	说话人数量参考值，`speech_recognition_param.diarization_enabled`参数为`true`生效，范围为`[2, 100]`。默认为自动预测说话人数量，若给定参考值只能辅助预测，不能确保最终一定获得参考值人数。

音频文件上传模式Header参数

名称	类型	必填	描述
Content-Type	String	是	固定值`multipart/form-data`
Authorization	String	是	"Bearer " + Apikey

音频文件链接模式form-data参数（异步ASR接口迎来重大升级🚀！下面参数即将废弃，请尽快使用新参数）

名称	类型	必填	描述
audio_file	文件	是	本地音频文件

音频文件链接模式form-data参数（异步ASR接口迎来重大升级🚀！下面参数为新参数）

名称	类型	必填	描述
audio_file	文件	是	本地音频文件
speech_recognition_param.model	string	是	模型名，支持`Fun-ASR`。
speech_recognition_param.channel_id	array(int)	否	需要识别的音轨索引，默认为`[0]`，若给定多个音轨，将同时识别给定的所有音轨，每个音轨独立计费。
speech_recognition_param.special_word_filter	string	否	通过JSON格式设置需要处理的敏感词，支持对不同敏感词设置不同的处理策略。若未传入该参数，将使用默认。JSON字段说明： `filter_with_signed`： - 类型：Object - 必填：否 - 描述：替换识别结果中的敏感词为相同长度的`` - filter_with_signed.word_list:字符串数组，包含所有需要被替换的敏感词。 `filter_with_empty`： - 类型：Object - 必填：否 - 描述：删除识别结果中的敏感词 - filter_with_empty.word_list:字符串数组，包含所有需要被删除的敏感词。 `system_reserved_filter`： - 类型：bool - 必填：否 - 描述：是否同时启用系统默认敏感词处理策略，匹配的敏感词将被同长度的``替换。 - 默认值：true。
speech_recognition_param.diarization_enabled	bool	否	是否启用说话人分离，默认为`false`（关闭）。
speech_recognition_param.speaker_count	int	否	说话人数量参考值，`speech_recognition_param.diarization_enabled`参数为`true`生效，范围为`[2, 100]`。默认为自动预测说话人数量，若给定参考值只能辅助预测，不能确保最终一定获得参考值人数。

响应说明

响应头参数

名称	值	描述
Content-Type	application/json

响应体参数

名称	类型	描述
task_id	string	任务id
created	int	时间戳

请求示例

HTTP API 音频文件链接模式

curl --location --request POST 'https://www.sophnet.com/api/open-apis/projects/easyllms/speechtotext/transcriptions' \
--header "Authorization: Bearer $API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
    "audio_url":"YOUR_AUDIO_URL"
}'

音频文件链接模式（新参数）

curl --location --request POST 'https://www.sophnet.com/api/open-apis/projects/easyllms/speechtotext/transcriptions' \
--header "Authorization: Bearer $API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
    "file_urls": ["YOUR_AUDIO_URL"],
    "speech_recognition_param": {
        "model": "Fun-ASR"
    }
}'

音频文件上传模式

curl --location --request POST '' \
--header "Authorization: Bearer $API_KEY" \
--header "Content-Type: multipart/form-data" \
-F "audio_file=@/path/to/your_audio_file.wav;type=audio/wav"

音频文件上传模式（新参数）

import requests
from requests_toolbelt.multipart.encoder import MultipartEncoder
import json

API_KEY = "YOUR_API_KEY"
AUDIO_FILE_NAME = ""
AUDIO_FILE_PATH = ""
AUDIO_FILE_FORMAT = "" # 例如audio/wav

param_data = {
    "speech_recognition_param": {
        "model": "Fun-ASR"
    }
}

# 使用MultipartEncoder更精确地控制multipart/form-data
multipart_data = MultipartEncoder(
    fields={
        "audio_file": (AUDIO_FILE_NAME, open(AUDIO_FILE_PATH, "rb"), AUDIO_FILE_FORMAT),
        "data": (None, json.dumps(param_data), "application/json")
    }
)

url = "https://www.sophnet.com/api/open-apis/projects/easyllms/speechtotext/transcriptions"

headers = {
    "Authorization": f"Bearer {API_KEY}"
}

headers["Content-Type"] = multipart_data.content_type

response = requests.post(url, headers=headers, data=multipart_data)
print(response.text)

响应示例

{
    "taskId": "10047816884",
    "created": 1724652804
}

查询Task状态、结果

请求说明

基本信息

请求地址：https://www.sophnet.com/api/open-apis/projects/easyllms/speechtotext/transcriptions/{taskId}

请求方式：GET

Path参数：

名称	类型	必填	描述
task_id	String	是	语音转文本task_id

Header参数

名称	类型	必填	描述
Content-Type	String	是	固定值application/json
Authorization	String	是	"Bearer" + Apikey

Body参数

无

响应说明

响应头参数

名称	值	描述
Content-Type	application/json

响应体参数

名称	类型	描述
task_id	string	任务id
status	string	任务状态，waiting：任务等待，doing：任务执行中，success：任务成功，failed：任务失败。示例值：waiting
result	string	转录结果
errorMsg	string	错误码
audio_duration	float	输入音频总长度，以秒为单位。
results	array(object)	转录结果，适用于`Fun-ASR`模型。
results[*].file_url	string	识别结果对应的音频url。
results[*].subtask_status	string	当前音频URL识别状态，支持`PENDING`、`RUNNING`、`SUCCEEDED`、`FAILED`。
results[*].code	string	转录结果代码。
results[*].message	string	转录结果消息。
results[*].transcripts	array(object)	文字、音频长度结果等。
results[].transcripts[].text	string	文字结果。
results[].transcripts[].content_duration_in_milliseconds	float	文字结果对应的音频长度。
results[].transcripts[].channel_id	int	文字结果对应的的音轨索引。

请求示例

HTTP API

curl --location --request GET 'https://www.sophnet.com/api/open-apis/projects/easyllms/speechtotext/transcriptions/{taskId}' \
--header "Authorization: Bearer $API_KEY" \

响应示例

{
    "taskId":"a2b1ae7a-d092-4e98-b5e8-2e58e2xxxxx",
    "status":"SUCCEEDED",
    "results":[
        {
            "transcripts":[
                {
                    "text":"欢迎大家来体验达摩院推出的语音识别模型。",
                    "content_duration_in_milliseconds":4400.0,
                    "channel_id":0
                }
            ],
            "file_url":"https://isv-data.oss-cn-hangzhou.aliyuncs.com/xx/xx/xx/test_audio/test.wav"
        }
    ],
    "audio_duration":4.4}

流式Speech to text

连接请求说明

基本信息

请求地址：wss://www.sophnet.com/api/open-apis/projects/{ProjectId}/easyllms/stream-speech

请求方式：Websocket

Path参数：

名称	类型	必填	描述
ProjectId	String	是	项目id

Request参数

名称	类型	必填	描述
easyllm_id	String	是	easyllm id
apikey	String	是	Apikey
format	String	是	输入音频格式，支持pcm、wav、mp3、opus、speex、aac、amr
sample_rate	int	是	音频采样率，任意音频采样率，但16k效果更好
heartbeat	bool	是	是否开启心跳，若为false即使发送静音音频也会在60s后超时关闭连接，需要在60s内包含人声音频，若为true，发送静音音频将保持连接，需要在60s内发送音频。

音频数据发送请求说明

音频数据发送说明：音频bytes数据，可按照3200的数量发送。

关闭连接请求说明

关闭连接请求说明：发送一个字符串"BYE"来主动关闭连接。

连接响应说明

响应体参数

名称	类型	描述
status	string	连接成功返回'ok'，失败则直接关闭连接

音频识别结果响应说明

响应体参数

名称	类型	描述
text	string	句子级识别结果，当is_sentence_end为false时，包含流式识别的输出结果，而为true则表示最终句子识别结果，下一个消息将是新的句子
begin_time	int	句子开始的时刻，单位为毫秒
end_time	int	句子结束的时刻，单位为毫秒
words	string	字级别的预测结果
is_sentence_end	bool	表示句子是否结束

连接请求示例

Websocket API

const url = `wss://www.sophnet.com/api/open-apis/projects/${ProjectId}/easyllms/stream-speech`
        + `?easyllm_id=${model}`
        + `&apikey=${apikey}`
        + `&format=${format}`
        + `&sample_rate=${sampleRate}`
        + `&heartbeat=true`;

ws = new WebSocket(url);
ws.binaryType = 'arraybuffer';

ws.onopen = () => {
log('WebSocket 已连接: ' + url);
};

ws.onmessage = (evt) => {
if (typeof evt.data === 'string') {
    log('<- ASR_RESULT: ' + evt.data);
} else {
    log('<- binary message (' + evt.data.byteLength + ' bytes)');
}
};

ws.onerror = (err) => {
log('WebSocket 错误: ' + err);
};

ws.onclose = (evt) => {
log(`WebSocket 已关闭: [${evt.code}] ${evt.reason}`);
};

音频数据发送请求示例

Websocket API

ws.send(byteData);

连接响应示例

{"status": "ok"}

音频识别结果响应示例

{"text":"这是深度神经网络的语音","begin_time":660,"end_time":null,"words": ["Word(beginTime=660, endTime=1148, text=这是, punctuation=, fixed=false)", "Word(beginTime=1148, endTime=1636, text=深度, punctuation=, fixed=false)", "Word(beginTime=1636, endTime=2124, text=神经, punctuation=, fixed=false)", "Word(beginTime=2124, endTime=2612, text=网络的, punctuation=, fixed=false)", "Word(beginTime=2612, endTime=3100, text=语音, punctuation=, fixed=false)"], "is_sentence_end": false}

{"text":"这是深度神经网络的语音识别","begin_time":660,"end_time":null,"words": ["Word(beginTime=660, endTime=1148, text=这是, punctuation=, fixed=false)", "Word(beginTime=1148, endTime=1636, text=深度, punctuation=, fixed=false)", "Word(beginTime=1636, endTime=2124, text=神经, punctuation=, fixed=false)", "Word(beginTime=2124, endTime=2612, text=网络的, punctuation=, fixed=false)", "Word(beginTime=2612, endTime=3100, text=语音, punctuation=, fixed=false)", "Word(beginTime=3100, endTime=3500, text=识别, punctuation=, fixed=false)"], "is_sentence_end": false}

{"text":"这是深度神经网络的语音识别","begin_time":660,"end_time":null,"words": ["Word(beginTime=660, endTime=1148, text=这是, punctuation=, fixed=false)", "Word(beginTime=1148, endTime=1636, text=深度, punctuation=, fixed=false)", "Word(beginTime=1636, endTime=2124, text=神经, punctuation=, fixed=false)", "Word(beginTime=2124, endTime=2612, text=网络的, punctuation=, fixed=false)", "Word(beginTime=2612, endTime=3100, text=语音, punctuation=, fixed=false)", "Word(beginTime=3100, endTime=3588, text=识别, punctuation=, fixed=false)"], "is_sentence_end": false}

{"text":"这是深度神经网络的语音识别模型。","begin_time":660,"end_time":5540,"words": ["Word(beginTime=660, endTime=1148, text=这是, punctuation=, fixed=false)", "Word(beginTime=1148, endTime=1636, text=深度, punctuation=, fixed=false)", "Word(beginTime=1636, endTime=2124, text=神经, punctuation=, fixed=false)", "Word(beginTime=2124, endTime=2612, text=网络的, punctuation=, fixed=false)", "Word(beginTime=2612, endTime=3100, text=语音, punctuation=, fixed=false)", "Word(beginTime=3100, endTime=3588, text=识别, punctuation=, fixed=false)", "Word(beginTime=3588, endTime=4076, text=模型, punctuation=, fixed=false)"], "is_sentence_end": true}

{"text":"请","begin_time":6001,"end_time":null,"words": ["Word(beginTime=6001, endTime=6502, text=请, punctuation=, fixed=false)"], "is_sentence_end": false}

非流式同步Speech to text

请求说明

基本信息

请求地址：https://www.sophnet.com/api/open-apis/projects/{projectId}/easyllms/speechtotext/non-stream

请求方式：POST

Path参数：

名称	类型	必填	描述
ProjectId	String	是	项目id

Header参数

名称	类型	必填	描述
Content-Type	String	是	固定值application/json
Authorization	String	是	"Bearer" + Apikey

Body参数

名称	类型	必填	描述
audio_url	String	是	设置待识别音频路径
speech_recognition_param	Object	否	识别参数
speech_recognition_param.model	String	否	设置识别模型。仅支持内部默认模型，不可选择，后续将支持多模型。
speech_recognition_param.sample_rate	Integer	否	设置待识别音频采样率。支持任意采样率。
speech_recognition_param.format	String	否	设置待识别音频格式。支持的音频格式：wav。
easyllm_id	String	是	Easyllm ID

响应说明

响应头参数

名称	值	描述
Content-Type	application/json

响应体参数

名称	类型	描述
result	String	模型推理结果
audio_duration	Float	音频总时长，单位为秒

请求示例

HTTP API

curl --location --request POST 'https://www.sophnet.com/api/open-apis/projects/{projectId}/easyllms/speechtotext/non-stream' \
--header "Authorization: Bearer $API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
    "easyllm_id": "{YOUR_EASYLLM_ID}",
    "audio_url": "{YOUR_AUDIO_URL}"
}'

响应示例

{
    "result":"你好，算能。",
    "audio_duration":2.471
}

Embeddings

请求说明

基本信息

请求地址：https://www.sophnet.com/api/open-apis/projects/{projectId}/easyllms/embeddings

请求方式：POST

Path参数：

名称	类型	必填	描述
ProjectId	String	是	项目id

Header参数

名称	类型	必填	描述
Content-Type	String	是	固定值application/json
Authorization	String	是	"Bearer" + Apikey

Body参数

名称	类型	必填	描述
model	string	否	支持：`text-embeddings`、`clip-embeddings`、`bge-m3`，默认为`text-embeddings`
input_texts	array(string)	是	数组中每一个元素是一个文本，每个文本最大支持8192个Tokens。`text-embeddings`模型最大支持10个文本，`bge-m3`模型最大支持1个文本。
input_images	array(string)	否	数组中每一个元素是一个base64 图像或 URL，仅对`clip-embeddings`模型有效
dimensions	integer	是	输出Embeddings的维度，`text-embeddings`模型支持1,024/768/512/256/128/64，`clip-embeddings`模型支持64到1024维，`bge-m3`模型仅支持1024维。
easyllm_id	string	是	Easyllm ID
normalized	bool	否	是否使用L2规范化，仅对`clip-embeddings`模型有效
encoding_type	string	否	输出数据的格式，仅对`clip-embeddings`模型有效

响应说明

响应头参数

名称	值	描述
Content-Type	application/json

响应体参数

名称	类型	描述
id	string	任务id
usage	dict	模型推理时Token使用情况
data	array	模型推理结果，按顺序输出，文本Embedding在前，图片Embedding在后

请求示例

HTTP API

# sample1
curl --location --request POST 'https://www.sophnet.com/api/open-apis/projects/{projectId}/easyllms/embeddings' \
--header "Authorization: Bearer $API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
    "easyllm_id": "{YOUR_EASYLLM_ID}",
    "input_texts": ["你好", "很高兴认识你"],
    "dimensions": 1024
}'

# sample2
curl --location --request POST 'https://www.sophnet.com/api/open-apis/projects/{projectId}/easyllms/embeddings' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $API_KEY' \
--data-raw '{
    "model": "clip-embeddings",
    "easyllm_id": "{YOUR_EASYLLM_ID}",
    "input_texts": ["海滩上美丽的日落"],
    "input_images": ["https://i.ibb.co/nQNGqL0/1beach1.jpg", "https://i.ibb.co/r5w8hG8/beach2.jpg", "iVBORw0KGgoAAAANSUhEUgAAABwAAAA4CAIAAABhUg/jAAAAMklEQVR4nO3MQREAMAgAoLkoFreTiSzhy4MARGe9bX99lEqlUqlUKpVKpVKpVCqVHksHaBwCA2cPf0cAAAAASUVORK5CYII="],
    "dimensions": 1024
}'

响应示例

// sample1
{
    "id": "",
    "object": "list",
    "usage": {
        "prompt_tokens": 4,
        "completion_tokens": null,
        "total_tokens": 4,
        "prompt_tokens_details": null,
        "completion_tokens_details": null
    },
    "data": [
        {
            "embedding": [
                -0.08296291530132294,
                0.03833295777440071,
                ...
            ],
            "index": 0,
            "object": "embedding"
        },
        {
            "embedding": [
                -0.05998880788683891,
                0.04025664180517197,
                ...
            ],
            "index": 1,
            "object": "embedding"
        }
    ]
}

// sample2
{
    "object": "list",
    "usage": {
        "prompt_tokens": 20008,
        "completion_tokens": null,
        "total_tokens": 20008,
        "prompt_tokens_details": null,
        "completion_tokens_details": null
    },
    "data": [
        {
            "embedding": [
                0.02087402,
                0.06689453,
                -0.07763672,
                -0.10253906,
                ...
            ],
            "index": 0,
            "object": "embedding"
        },
        {
            "embedding": [
                0.01507568,
                0.16015625,
                -0.08837891,
                ...
            ],
            "index": 1,
            "object": "embedding"
        },
        {
            "embedding": [
                0.04882812,
                0.20214844,
                -0.07861328,
                0.00276184,
                ...
            ],
            "index": 2,
            "object": "embedding"
        },
        {
            "embedding": [
                -0.00939941,
                0.18164062,
                0.02038574,
                0.01239014,
                ...
            ],
            "index": 3,
            "object": "embedding"
        }
    ]
}

Document Parse

请求说明

基本信息

功能描述：高效转换主流格式文档至精准、易用的Markdown文本内容。上传文件（form-data），输出文档内容（Markdown）

请求地址：https://www.sophnet.com/api/open-apis/projects/easyllms/doc-parse

请求方式：POST

Header参数

名称	类型	必填	描述
Content-Type	String	是	固定值`multipart/form-data`
Authorization	String	是	"Bearer " + Apikey

form-data参数

名称	类型	必填	描述
file	file	是	文档，支持pdf,docx,doc,xlsx,txt,pptx格式，大小<50MB

响应说明

响应头参数

名称	值	描述
Content-Type	application/json

响应体参数

名称	类型	值
data	string	文档解析结果（Markdown）

请求示例

HTTP API

curl --location --request POST 'https://www.sophnet.com/api/open-apis/projects/easyllms/doc-parse' \
--header "Authorization: Bearer $API_KEY" \
--header "Content-Type: multipart/form-data" \
--form 'file=@"YOUR_DOCUMENT"'

响应示例

{
  "data": "文件编号：HR-2023-06-3-1\n\n发布单位：AMT\n\n发布对象：全员\n\n发布日期：2023.11.16\n\n生效日期：2023.11.16\n\n**管理制度**\n\n......"
}

Text to voice

流式/非流式 Text to voice

请求说明

基本信息

功能描述：文字转语音服务。发送文件，输出语音（默认为mp3格式）。

请求地址：

流式接口：https://www.sophnet.com/api/open-apis/projects/{ProjectId}/easyllms/voice/synthesize-audio-stream

非流式接口：https://www.sophnet.com/api/open-apis/projects/{ProjectId}/easyllms/voice/synthesize-audio

请求方式：POST

Path参数：

名称	类型	必填	描述
ProjectId	String	是	项目id

Header参数

名称	类型	必填	描述
Content-Type	String	是	固定值application/json
Authorization	String	是	"Bearer" + Apikey

Body参数

名称	类型	必填	描述
text	array(string)	是	需要转语音的字符串列表
synthesis_param	object	是	转语音参数
synthesis_param.model	string	是	指定模型，默认值为"cosyvoice-v1"，支持"cosyvoice-v1"、"cosyvoice-v2"。建议使用"cosyvoice-v2"。
synthesis_param.voice	string	否	指定音色，默认值为"longxiaochun"
synthesis_param.format	string	否	指定音频编码格式及采样率，格式为"文件格式_采样率_通道_比特率"，例如`MP3_16000HZ_MONO_128KBPS`代表音频格式为mp3，采样率为16kHz。若未指定format，系统将根据voice参数自动选择该音色的推荐格式。各个文件格式对应的示例：`WAV_16000HZ_MONO_16BIT`、`MP3_16000HZ_MONO_128KBPS`、`PCM_16000HZ_MONO_16BIT`，其他文件格式暂未支持。
synthesis_param.volume	number	否	指定音量，默认值为50，取值范围：[0-100]
synthesis_param.speechRate	number	否	指定语速，默认值为1，输入范围：[0.5,2]
synthesis_param.pitchRate	number	否	指定语调，默认值为1，取值范围：[0.5,2]
easyllm_id	string	是	Easyllm ID

限制说明
- cosyvoice类模型：单次发送文本长度不能超过 2000 字符（汉字算2个字符，其他算1个）。

响应说明

流式接口

响应头参数

名称	值	描述
Content-Type	text/event-stream

响应体参数

名称	类型	值
data	string	base64编码的语音数据

非流式接口

响应头参数

名称	值	描述
Content-Type	audio/mpeg, audio/wav, audio/L16, application/octet-stream	二进制音频流，依据具体格式返回对应类型

请求示例

HTTP API

curl --location --request POST 'https://www.sophnet.com/api/open-apis/projects/{projectId}/easyllms/voice/synthesize-audio-stream' \
--header "Authorization: Bearer $API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
    "easyllm_id": "YOUR_SERVICE_ID",
    "text": ["这是要合成的文本", "第二段文本"],
    "synthesis_param": {
        "model": "cosyvoice-v1",
        "voice": "longxiaochun",
        "format": "WAV_16000HZ_MONO_16BIT",
        "volume": 80,
        "speechRate": 1.2,
        "pitchRate": 1.0
    }
}'

Python requests

NOTE: 这里演示如何输出为音频文件

流式接口

import requests
import json
import base64

projectId = "YOUR_PROJECT_ID"
easyllmId = "YOUR_EASYLLM_SERVICE_ID"
API_KEY = "YOUR_API_KEY"

url = f"https://www.sophnet.com/api/open-apis/projects/{projectId}/easyllms/voice/synthesize-audio-stream"

headers = {
   'Content-Type': 'application/json',
   'Authorization': 'Bearer ' + API_KEY,
}

payload = json.dumps({
   "easyllm_id": easyllmId,
   "text": [
       "测试",
   ],
   "synthesis_param": {
       "model": "cosyvoice-v1",
       "voice": "longxiaochun",
       "format": "MP3_16000HZ_MONO_128KBPS",
       "volume": 80,
       "speechRate": 1.2,
       "pitchRate": 1
   }
})

response = requests.request("POST", url, headers=headers, data=payload)
for chunk in response.iter_lines(decode_unicode=True):
    with open("output.mp3","ab") as f:
        if chunk:
            if (frame:=json.loads(chunk[5:])["audioFrame"]):
                f.write(base64.b64decode(frame))

非流式接口

import requests
import json
import base64

projectId = "YOUR_PROJECT_ID"
easyllmId = "YOUR_EASYLLM_SERVICE_ID"
API_KEY = "YOUR_API_KEY"

url = f"https://www.sophnet.com/api/open-apis/projects/{projectId}/easyllms/voice/synthesize-audio"

headers = {
   'Content-Type': 'application/json',
   'Authorization': 'Bearer ' + API_KEY,
}

payload = json.dumps({
   "easyllm_id": easyllmId,
   "text": [
       "测试",
   ],
   "synthesis_param": {
       "model": "cosyvoice-v1",
       "voice": "longxiaochun",
       "format": "MP3_16000HZ_MONO_128KBPS",
       "volume": 80,
       "speechRate": 1.2,
       "pitchRate": 1
   }
})

response = requests.request("POST", url, headers=headers, data=payload)
with open("output.mp3","wb") as f:
    f.write(response.content)

响应示例

{'status': 'accepting', 'usage': None, 'audioFrame': '{BASE64 encoded data}'}

声音复刻

音色创建

请求说明

基本信息

功能描述：为某个服务的某个模型下创建音色，以便在Text to voice接口中使用。

请求地址：https://www.sophnet.com/api/open-apis/projects/easyllms/voice/upload

请求方式：POST

Header参数

名称	类型	必填	描述
Content-Type	String	是	固定值multipart/form-data
Authorization	String	是	"Bearer" + Apikey

Form参数

名称	类型	必填	描述
audio_file	File	是	用于创建音色的文件，建议10-20s，不得超过60s，文件不得超过10MB。
tts_speaker_voice_generate_req	object	是	音色创建参数。
tts_speaker_voice_generate_req.easyllm_id	string	是	指定服务下去创建音色。
tts_speaker_voice_generate_req.model	string	是	指定模型下去创建音色，支持的模型类型可参考：Text to voice。
tts_speaker_voice_generate_req.name	string	否	指定音色名称，能辨别不同音色即可。
tts_speaker_voice_generate_req.des	string	否	音色的描述。
tts_speaker_voice_generate_req.prompt_text	string	是	用于创建音色的音频文件中对应的文字内容。

限制：单个组织下最多能创建100个音色。

响应说明

响应头参数

名称	值	描述
Content-Type	application/json

响应体参数

名称	类型	值
status	int	返回状态，0为成功。
message	string	返回消息。
result	null	固定为null。
timestamp	int	时间戳。

请求示例

import requests
from requests_toolbelt.multipart.encoder import MultipartEncoder
import json

easyllmId = "YOUR_EASYLLM_SERVICE_ID"
API_KEY = "YOUR_API_KEY"
AUDIO_FILE_NAME = ""
AUDIO_FILE_PATH = ""
AUDIO_FILE_FORMAT = "" # 例如audio/wav
PROMPT_TEXT = "YOUR_AUDIO_CONTENT"

voice_data = {
    "easyllm_id": easyllmId, 
    "model": "cosyvoice-v1", 
    "name": "voice1", 
    "prompt_text": PROMPT_TEXT
}

# 使用MultipartEncoder更精确地控制multipart/form-data
multipart_data = MultipartEncoder(
    fields={
        "audio_file": (AUDIO_FILE_NAME, open(AUDIO_FILE_PATH, "rb"), AUDIO_FILE_FORMAT),
        "tts_speaker_voice_generate_req": (None, json.dumps(voice_data), "application/json")
    }
)

url = "https://www.sophnet.com/api/open-apis/projects/easyllms/voice/upload"

headers = {
    "Authorization": f"Bearer {API_KEY}"
}

headers["Content-Type"] = multipart_data.content_type

response = requests.post(url, headers=headers, data=multipart_data)
print(response)

响应示例

{
    "status": 0,
    "message": "请求成功",
    "result": null,
    "timestamp": 1765420402078
}

音色查询

请求说明

基本信息

功能描述：查询某个服务的某个模型下所有可用音色，以便在Text to voice接口中使用。

请求地址：https://www.sophnet.com/api/open-apis/projects/easyllms/voice/{easyllmId}

请求方式：GET

Path参数：

名称	类型	必填	描述
easyllmId	String	是	服务id

Header参数

名称	类型	必填	描述
Authorization	String	是	"Bearer" + Apikey

响应说明

响应头参数

名称	值	描述
Content-Type	application/json

响应体参数

名称	类型	值
status	int	返回状态，0为成功。
message	string	返回消息。
result	list	音色列表。
result[x].model	string	音色所属模型。
result[x].des	string	音色描述。
result[x].name	string	音色名称。
result[x].voice_id	string	音色的唯一标识，用于TTS请求。
result[x].bucket_name	string	总是null。
result[x].file_name	string	总是null。
result[x].last_used_at	string	音色上次使用的时间。
timestamp	int	时间戳。

请求示例

import requests

easyllmId = "YOUR_EASYLLM_SERVICE_ID"
API_KEY = "YOUR_API_KEY"

url = f"https://www.sophnet.com/api/open-apis/projects/easyllms/voice/{easyllmId}"

headers = {
   'Authorization': f'Bearer {API_KEY}'
}

response = requests.request("GET", url, headers=headers)

print(response.text)

响应示例

{
    "status": 0,
    "message": "请求成功",
    "result": [
        {
            "model": "xx",
            "des": null,
            "name": "xx",
            "voice_id": "xx",
            "bucket_name": null,
            "file_name": null,
            "last_used_at": "xx"
        }
    ],
    "timestamp": 1765423576409
}

音色删除

请求说明

基本信息

功能描述：删除某个服务的某个模型下的某个可用音色，调用后音色不能在Text to voice接口中继续使用。

请求地址：https://www.sophnet.com/api/open-apis/projects/easyllms/voice/remove

请求方式：DELETE

Header参数

名称	类型	必填	描述
Authorization	String	是	"Bearer" + Apikey
Content-Type	String	是	application/json

Body参数

名称	类型	必填	描述
easyllm_id	string	是	待删除音色所属的服务id。
model	string	是	待删除音色所属的模型名。
voice_id	string	是	待删除音色的唯一标识。

响应说明

响应头参数

名称	值	描述
Content-Type	application/json

响应体参数

名称	类型	值
status	int	返回状态，0为成功。
message	string	返回消息。
result	null	固定为null。
timestamp	int	时间戳。

请求示例

import requests
import json

easyllmId = "YOUR_EASYLLM_SERVICE_ID"
API_KEY = "YOUR_API_KEY"
VOICE_ID = "YOUR_VOICE"
MODEL = "YOUR_MODEL_NAME"

url = "https://www.sophnet.com/api/open-apis/projects/easyllms/voice/remove"

payload = json.dumps({
   "easyllm_id": easyllmId,
   "model": MODEL,
   "voice_id": VOICE_ID
})
headers = {
   'Content-Type': 'application/json',
   'Authorization': f'Bearer {API_KEY}'
}

response = requests.request("DELETE", url, headers=headers, data=payload)

print(response.text)

响应示例

{
    "status": 0,
    "message": "请求成功",
    "result": null,
    "timestamp": 1765423576409
}

Image OCR

请求说明

基本信息

功能描述：图片OCR服务。发送图片，输出图片中的文本信息。

请求地址：https://www.sophnet.com/api/open-apis/projects/easyllms/image-ocr

请求方式：POST

Header参数

名称	类型	必填	描述
Content-Type	String	是	固定值application/json
Authorization	String	是	"Bearer" + Apikey

Body参数

名称	类型	必填	描述
model	string	是	使用的模型，可选`PaddleOCR-VL-0.9B`，`PaddleOCR-VL-1.5`，`DeepSeek-OCR`
prompt	string	否	模型prompt参数，仅支持`DeepSeek-OCR`，默认值为`<image>\nFree OCR.`
type	string	是	图片类型，固定为"image_url"
image_url	object	是	图片参数
image_url.url	string	是	图片，可以是base64图片，固定格式为"data:image/jpeg;base64,{base64_data}"；也可以是图片url链接
prettify_markdown	bool	否	是否输出美化后的 Markdown 文本。默认为 true。
show_formula_number	bool	否	输出的 Markdown 文本中是否包含公式编号。默认为 false。

响应说明

响应头参数

名称	值	描述
Content-Type	application/json

响应体参数

名称	类型	值
status	int	0表示成功,其他值表示失败
message	string	调用成功返回请求成功,否则返回错误信息
result	array(object)	返回的结果,有一个个的段落组成,如果是use_html_out我1,则list的长度为1，有每个段落包含以下字段
result[0].label	string	段落的类型,可以是text，table，html等
result[0].texts	string	段落的文本
result[0].position	array(int)	段落的位置，格式为left,top,right,bottom
markdown.text	string	返回的Markdown文本,如果prettifyMarkdown为true,则会美化Markdown文本,如果showFormulaNumber为true,则会在Markdown文本中包含公式编号

请求示例

HTTP API

curl --location --request POST 'https://www.sophnet.com/api/open-apis/projects/easyllms/image-ocr' \
--header "Authorization: Bearer $API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
    "model": "PaddleOCR-VL-0.9B",
    "type":"image_url",
    "image_url": {
            "url": "data:image/jpeg;base64,/9j/..."
    },
    "prettify_markdown": true,
    "show_formula_number": false
}'

响应示例

{
    "status":0,
    "message":"请求成功",
    "result": [
        {
            "label": "text",
            "texts": "测试",
            "position": "0,0,720,1920"
        }
    ],
    "markdown": {"text":"测试"}
}

Chat completions + Text to voice

请求说明

基本信息

请求地址：https://www.sophnet.com/api/open-apis/v1/chat/completions-with-voice-output

请求方式：POST

Header参数

名称	类型	必填	描述
Content-Type	String	是	固定值application/json
Authorization	String	是	"Bearer" + Apikey

Body参数

名称	类型	必填	描述
chat_completion_req	dict	是	包含Chat completions的参数
speech_synthesis_req	dict	是	包含Text to voice的部分参数

chat_completion_req参数：参考Chat completions章节，仅支持流式
speech_synthesis_req参数：仅支持流式

名称	类型	必填	描述
synthesis_param	dict	否	转语音参数
synthesis_param.model	string	否	指定模型，默认值为"cosyvoice-v1"
synthesis_param.voice	string	否	指定音色，默认值为"longxiaochun"，支持longwan/longcheng/longhua/longxiaochun/longxiaoxia/longxiaocheng/longxiaobai/longlaotie/longshu/longshuo/longjing/longyue/loongstella/loongbella
synthesis_param.format	string	否	指定音频编码格式及采样率，格式为"文件格式_采样率_通道_比特率"，例如`MP3_16000HZ_MONO_128KBPS`代表音频格式为mp3，采样率为16kHz。若未指定format，系统将根据voice参数自动选择该音色的推荐格式。
synthesis_param.volume	number	否	指定音量，默认值为50，取值范围：[0-100]
synthesis_param.speechRate	number	否	指定语速，默认值为1，输入范围：[0.5,2]
synthesis_param.pitchRate	number	否	指定语调，默认值为1，取值范围：[0.5,2]
easyllm_id	string	是	Easyllm ID

响应说明

基本信息：会返回两类消息，分别是Chat completions响应消息和Text to voice响应消息
Chat completions响应说明：参考Chat completions章节
Text to voice响应说明

名称	类型	值
audioFrame	dict	语音数据结果
audioFrame.array	String	Base64编码的音频数据
status	String	目前服务状态

请求示例

示例如下，请将参数示例值替换为实际值。

curl请求示例

curl --location --request POST 'https://www.sophnet.com/api/open-apis/v1/chat/completions-with-voice-output' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $API_KEY' \
--data-raw '{
    "chat_completion_req": {
    "messages": [
        {
            "role": "user",
            "content": "你好"
        }
    ],
    "model":"Qwen2.5-32B-Instruct",
    "stream": true
    },
    "speech_synthesis_req": {
        "easyllm_id": "${easyllm_id}"
    }
}'

响应示例

{"choices": [{"delta": {"content": "", "role": "assistant"},"index": 0}],"created": 1749037853,"id": "chatcmpl-xxx","model": "Qwen2.5-32B-Instruct","object": "chat.completion.chunk"}

{"choices":[{"delta":{"content":"你好"},"index":0}],"created":1749037853,"id":"chatcmpl-xxx","model":"Qwen2.5-32B-Instruct","object":"chat.completion.chunk"}

{"choices":[{"delta":{"content":"！"},"index":0}],"created":1749037853,"id":"chatcmpl-xxx","model":"Qwen2.5-32B-Instruct","object":"chat.completion.chunk"}

{"choices":[{"delta":{"content":"有什么"},"index":0}],"created":1749037853,"id":"chatcmpl-xxx","model":"Qwen2.5-32B-Instruct","object":"chat.completion.chunk"}

{"status":"accepting","usage":null,"audioFrame":"SUQzBAAA..."}

{"choices":[{"delta":{"content":"可以帮助你的吗？"},"index":0}],"created":1749037853,"id":"chatcmpl-xxx","model":"Qwen2.5-32B-Instruct","object":"chat.completion.chunk"}

{"status":"accepting","usage":null,"audioFrame":null}

{"status":"accepting","usage":null,"audioFrame":"//PCxO1..."}

{"choices":[{"delta":{"content":""},"finish_reason":"stop","index":0}],"created":1749037853,"id":"chatcmpl-xxx","model":"Qwen2.5-32B-Instruct","object":"chat.completion.chunk"}

{"status":"accepting","usage":null,"audioFrame":"//PAxPJh..."}

{"status":"finish","usage":{"characters":26},"audioFrame":null}

...

语音对话

上行事件

连接请求事件说明

基本信息

请求地址：wss://www.sophnet.com/api/open-apis/projects/{ProjectId}/chat/speech-chat

请求方式：Websocket

Path参数：

名称	类型	必填	描述
ProjectId	String	是	项目id

Request参数

名称	类型	必填	描述
token	String	是	包含“Bearer ”前缀，后跟Apikey
model	String	是	Chat completions服务的模型名
asr_easyllm_id	String	是	Speech to text服务的easyllm id
tts_easyllm_id	String	是	Text to voice服务的easyllm id

对话配置更新事件说明

事件类型：chat.update
事件说明：该事件发生在获得连接请求响应后，可选地更新对话配置，在执行流式上传音频片段前，可更新多次，之后将不能更新。若不更新则使用默认参数。仅支持流式
参数说明：字段create_transcription_task_req、chat_completion_with_voice_output_req.chat_completion_req和chat_completion_with_voice_output_req.speech_synthesis_req必须设置，其内容为空则所有字段使用默认参数，若部分字段为空则未设置的字段将使用默认值。
默认参数：

{
    "create_transcription_task_req": {
        "heartbeat": false,
        "speech_recognition_param": {
            "sample_rate": 16000,
            "format": "wav"
        }
    },
    "chat_completion_with_voice_output_req": {
        "chat_completion_req": {
            "messages": [],
            "model": "${model参数}",
            "stream": true
        },
        "speech_synthesis_req": {
            "stream": true,
            "synthesis_param": {
                "model": "cosyvoice-v1",
                "voice": "longxiaochun",
                "format": "MP3_22050HZ_MONO_256KBPS"
            }
        }
    },
    "asr_mode": "online"
}

事件消息结构：

参数	类型	必填	说明
event_type	String	是	事件类型，固定为chat.update
message	String	是	配置参数JSON字符串，格式{"create_transcription_task_req": {可参考流式Speech to text中Request参数说明}, "chat_completion_with_voice_output_req": {可参考Chat completions + Text to voice中Body参数说明}, "asr_mode": asr加载模式}
message.asr_mode	String	否	目前支持三种："online"/"refresh"/"dynamic"，默认为"online"模式。"refresh"模式指每次执行llm+tts时清空asr音频缓存，并继续监听，对于输入截断的音频建议开启，"online"模式指总是开启ASR，对于连续音频流建议开启，"dynamic"模式指动态开启ASR，在LLM+TTS推理开始时会关闭，在下一次传递bytes音频数据的时候会自动开启。

流式上传音频片段

事件说明：该事件发生后对话配置不允许再被更新，并将根据heartbeat设置的参数判断超时，超时将关闭连接。
事件消息结构：二进制音频数据块，可按照100ms、200ms传输，根据实际情况调整。

音频提交事件说明

事件类型：input_audio_buffer.complete
事件说明：该事件发生后将停止ASR，并将ASR识别结果作为LLM输入，转为LLM+TTS推理。
事件消息结构：

参数	类型	必填	说明
event_type	String	是	事件类型，固定为input_audio_buffer.complete

音频缓存清理事件说明

事件类型：input_audio_buffer.clear
事件说明：该事件发生后将清空ASR服务的音频缓存bytes数据（可能会发送一条ASR识别结果），并清空已识别结果。若在未发送过音频bytes数据情况下执行该事件，将报错。
事件消息结构：

参数	类型	必填	说明
event_type	String	是	事件类型，固定为input_audio_buffer.clear

上下文清理事件说明

事件类型：conversation.clear
事件说明：该事件发生后将清理之前的LLM上下文记录，但对话配置更新事件设置的上下文不会被清理。
事件消息结构：

参数	类型	必填	说明
event_type	String	是	事件类型，固定为conversation.clear

打断事件说明

事件类型：conversation.chat.cancel
事件说明：该事件发生后将中断LLM+TTS推理，并转为ASR。
事件消息结构：

参数	类型	必填	说明
event_type	String	是	事件类型，固定为conversation.chat.cancel

心跳事件说明

事件类型：ping
事件说明：需要定时发送该消息，如果超过60s，将关闭连接。
事件消息结构：

参数	类型	必填	说明
event_type	String	是	事件类型，固定为ping

下行事件

连接请求事件响应

事件类型：chat.created
事件说明：需要等连接响应返回后才能执行其他事件。
响应体参数

参数	类型	说明
event_type	String	事件类型，固定为chat.created
status	int	事件状态，0表示成功，非0表示失败
message	String	失败原因说明

对话配置更新事件响应

事件类型：chat.updated
响应体参数

参数	类型	说明
event_type	String	事件类型，固定为chat.updated
status	int	事件状态，0表示成功，非0表示失败
message	String	失败原因说明

增量音频识别结果响应

事件说明：流式返回音频识别结果。
响应体参数：参考流式Speech to text中返回响应

音频提交事件响应

事件类型：input_audio_buffer.completed
响应体参数

参数	类型	说明
event_type	String	事件类型，固定为input_audio_buffer.completed
status	int	事件状态，0表示成功，非0表示失败
message	String	失败原因说明

音频缓存清空事件响应

事件类型：input_audio_buffer.cleared
响应体参数

参数	类型	说明
event_type	String	事件类型，固定为input_audio_buffer.cleared
status	int	事件状态，0表示成功，非0表示失败
message	String	失败原因说明

上下文清理事件响应

事件类型：conversation.cleared
响应体参数

参数	类型	说明
event_type	String	事件类型，固定为conversation.cleared
status	int	事件状态，0表示成功，非0表示失败
message	String	失败原因说明

打断事件响应

事件类型：conversation.chat.canceled
响应体参数

参数	类型	说明
event_type	String	事件类型，固定为conversation.chat.canceled
status	int	事件状态，0表示成功，非0表示失败
message	String	失败原因说明

增量LLM+TTS推理结果响应

事件说明：流式返回LLM和TTS推理结果。
响应体参数：参考Chat completions + Text to voice中返回响应

当次对话完成响应

事件类型：conversation.chat.completed
事件说明：在LLM+TTS正常推理结束、发生错误或被打断，则该消息会被返回，将开启ASR。
响应体参数

参数	类型	说明
event_type	String	事件类型，固定为conversation.chat.completed
status	int	固定为0
message	String	固定为空

心跳事件响应

事件类型：pong
响应体参数

参数	类型	说明
event_type	String	事件类型，固定为pong
status	int	事件状态，0表示成功，非0表示失败
message	String	失败原因说明

错误响应

事件类型：error
事件说明：其他错误，例如请求参数或运行时错误。
响应体参数

参数	类型	说明
event_type	String	事件类型，固定为error
status	int	非0
message	String	错误原因说明

请求/响应示例

连接请求示例

连接请求

const url = `wss://www.sophnet.com/api/open-apis/projects/{ProjectId}/chat/speech-chat`
              + `?model=${model}`
              + `&token=Bearer ${apikey}`
              + `&asr_easyllm_id=${asr_easyllm_id}`
              + `&tts_easyllm_id=${tts_easyllm_id}`;

ws = new WebSocket(url);

ws.onopen = () => {
    log('WebSocket 已连接: ' + url);
};

ws.onmessage = (evt) => {
    log('<- RESULT: ' + evt.data);
};

ws.onerror = (err) => {
log('WebSocket 错误: ' + err);
};

ws.onclose = (evt) => {
log(`WebSocket 已关闭: [${evt.code}] ${evt.reason}`);
};

连接响应

{"status":0,"message":"","event_type":"chat.created"}

对话配置更新示例

对话配置更新请求

ws.send('{"event_type":"chat.update","message":"{\\"create_transcription_task_req\\":{\\"service_uuid\\":\\"\\"},\\"chat_completion_with_voice_output_req\\":{\\"chat_completion_req\\":{\\"messages\\":[{\\"role\\":\\"system\\",\\"content\\":\\"你是人工智能助手。\\"}],\\"model\\":\\"\\",\\"stream\\":true},\\"speech_synthesis_req\\":{\\"service_id\\":\\"\\"}}}"}');

对话配置更新响应

{"status":0,"message":"","event_type":"chat.updated"}

流式上传音频片段示例

Websocket API

ws.send(byteData);

ASR结果、LLM结果、TTS结果响应示例

可分别参考流式Speech to text和Chat completions + Text to voice中的返回示例。