
关于
适用于短音频的 Azure 语音转文字 REST API(Python)。用于最长 60 秒音频文件的简单语音识别。
name: azure-speech-to-text-rest-py description: Azure 语音转文本 REST API(Python),用于短音频。适用于无需 Speech SDK 的最长 60 秒音频文件的简单语音识别。 risk: unknown source: community date_added: '2026-02-27'
Azure 语音转文本 REST API(短音频)
用于短音频文件(最长 60 秒)语音转文本转录的简单 REST API。无需 SDK — 仅需 HTTP 请求。
前提条件
环境变量
# Required
AZURE_SPEECH_KEY=<your-speech-resource-key>
AZURE_SPEECH_REGION=<region> # e.g., eastus, westus2, westeurope
# Alternative: Use endpoint directly
AZURE_SPEECH_ENDPOINT=https://<region>.stt.speech.microsoft.com
安装
pip install requests
快速开始
import os
import requests
def transcribe_audio(audio_file_path: str, language: str = "en-US") -> dict:
"""Transcribe short audio file (max 60 seconds) using REST API."""
region = os.environ["AZURE_SPEECH_REGION"]
api_key = os.environ["AZURE_SPEECH_KEY"]
url = f"https://{region}.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1"
headers = {
"Ocp-Apim-Subscription-Key": api_key,
"Content-Type": "audio/wav; codecs=audio/pcm; samplerate=16000",
"Accept": "application/json"
}
params = {
"language": language,
"format": "detailed"
}
with open(audio_file_path, "rb") as audio_file:
response = requests.post(url, headers=headers, params=params, data=audio_file)
response.raise_for_status()
return response.json()
# Usage
result = transcribe_audio("audio.wav", "en-US")
print(result["DisplayText"])
音频要求
| 格式 | 编解码器 | 采样率 | 备注 | |------|----------|--------|------| | WAV | PCM | 16 kHz,单声道 | 推荐 | | OGG | OPUS | 16 kHz,单声道 | 文件更小 |
限制:
- 最长 60 秒音频
- 发音评估:最长 30 秒
- 无部分/中间结果(仅最终结果)
Content-Type 头
# WAV PCM 16kHz
"Content-Type": "audio/wav; codecs=audio/pcm; samplerate=16000"
# OGG OPUS
"Content-Type": "audio/ogg; codecs=opus"
响应格式
简单格式(默认)
params = {"language": "en-US", "format": "simple"}
{
"RecognitionStatus": "Success",
"DisplayText": "Remind me to buy 5 pencils.",
"Offset": "1236645672289",
"Duration": "1236645672289"
}
详细格式
params = {"language": "en-US", "format": "detailed"}
{
"RecognitionStatus": "Success",
"Offset": "1236645672289",
"Duration": "1236645672289",
"NBest": [
{
"Confidence": 0.9052885,
"Display": "What's the weather like?",
"ITN": "what's the weather like",
"Lexical": "what's the weather like",
"MaskedITN": "what's the weather like"
}
]
}
分块传输(推荐)
为降低延迟,以分块方式流式传输音频:
import os
import requests
def transcribe_chunked(audio_file_path: str, language: str = "en-US") -> dict:
"""Stream audio in chunks for lower latency."""
region = os.environ["AZURE_SPEECH_REGION"]
api_key = os.environ["AZURE_SPEECH_KEY"]
url = f"https://{region}.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1"
headers = {
"Ocp-Apim-Subscription-Key": api_key,
"Content-Type": "audio/wav; codecs=audio/pcm; samplerate=16000",
"Accept": "application/json",
"Transfer-Encoding": "chunked",
"Expect": "100-continue"
}
params = {"language": language, "format": "detailed"}
def audio_chunks():
with open(audio_file_path, "rb") as f:
while chunk := f.read(4096):
yield chunk
response = requests.post(url, headers=headers, params=params, data=audio_chunks())
response.raise_for_status()
return response.json()
兼容工具
Claude CodeCursor
标签
后端开发
