如何将实时音频流端点连接到直接语音端点? [英] How to hook real-time audio stream endpoint to Direct Line Speech Endpoint?

查看:133
本文介绍了如何将实时音频流端点连接到直接语音端点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试连接实时音频终结点,该终结点会与直接语音(DLS)终结点产生连续的音频流,并最终与我的Azure机器人api进行交互.

I am trying to hook-up my real time audio endpoint which produces continuous audio stream with Direct Line Speech (DLS) endpoint which eventually interacts with my Azure bot api.

我有一个websocket API,可以连续接收二进制格式的音频流,这就是我打算将其转发到DLS端点,以便与我的机器人进行连续的Speech2Text.

I have a websocket API that continuously receives audio stream in binary format and this is what I intend to forward it to the DLS endpoint for continuous Speech2Text with my bot.

基于反馈和

Based on the feedback and answer here, I have been able to hook up my Direct Line speech endpoint with a real-time stream.

我尝试了一个示例wav文件,该文件可以被DLS正确转录,并且我的机器人可以正确检索文本以对其进行操作.

I've tried a sample wav file which correctly gets transcribed by DLS and my bot is correctly able to retrieve the text to operate on it.

我已使用 PushAudioInputStream 方法将音频流推送到DLS语音端点.

I have used the ListenOnce() API and am using a PushAudioInputStream method to push the audio stream to the DLS speech endpoint.

下面的代码是ListenOnce()方法的内部内容

The below code is internals of ListenOnce() method

// Create a push stream
using (var pushStream = AudioInputStream.CreatePushStream())
{
    using (var audioInput = AudioConfig.FromStreamInput(pushStream))
    {
        // Create a new Dialog Service Connector
        this.connector = new DialogServiceConnector(dialogServiceConfig, audioInput);
        // ... also subscribe to events for this.connector

        // Open a connection to Direct Line Speech channel
        this.connector.ConnectAsync();
        Debug.WriteLine("Connecting to DLS");

        pushStream.Write(dataBuffer, dataBuffer.Length);

        try
        {
            this.connector.ListenOnceAsync();
            System.Diagnostics.Debug.WriteLine("Started ListenOnceAsync");
        }
    }
}

上面代码中的

dataBuffer是我在websocket上收到的二进制数据的块".

dataBuffer in above code is the 'chunk' of binary data I've received on my websocket.

const int maxMessageSize = 1024 * 4; // 4 bytes
var dataBuffer = new byte[maxMessageSize];

while (webSocket.State == WebSocketState.Open)
{
    var result = await webSocket.ReceiveAsync(new ArraySegment<byte>(dataBuffer), CancellationToken.None);
    if (result.MessageType == WebSocketMessageType.Close)
    {
        Trace.WriteLine($"Received websocket close message: {result.CloseStatus.Value}, {result.CloseStatusDescription}");
        await webSocket.CloseAsync(result.CloseStatus.Value, result.CloseStatusDescription, CancellationToken.None);
    }
    else if (result.MessageType == WebSocketMessageType.Text)
    {
        var message = Encoding.UTF8.GetString(dataBuffer);
        Trace.WriteLine($"Received websocket text message: {message}");
    }
    else // binary
    {
        Trace.WriteLine("Received websocket binary message");
        ListenOnce(dataBuffer); //calls the above 
    }
}

但是上面的代码不起作用.我相信我对这种方法有很多问题/疑问-

But the above code doesn't work. I believe I have couple of issues/questions with this approach -

  1. 我相信我没有正确地将数据分块为Direct Line Speech,以确保其接收完整的音频以进行正确的S2T转换.
  2. 我知道DLS API支持 ListenOnceAsync(),但不确定是否支持ASR(它知道对方的讲话者何时停止讲话)
  3. 我能否仅获取Direct Line Speech端点的websocket网址,并假设DLS正确使用了直接的websocket流?
  1. I believe I am not correctly chunking the data to Direct Line Speech to ensure that it receives full audio for correct S2T conversion.
  2. I know DLS API supports ListenOnceAsync() but not sure if this supports ASR (it knows when the speaker on other side stopped talking)
  3. Can I just get the websocket url for the Direct Line Speech endpoint and assume DLS correctly consumes the direct websocket stream?

推荐答案

我相信我没有正确地将数据分块到Direct Line Speech,以确保其接收到完整的音频以进行正确的S2T转换.

I believe I am not correctly chunking the data to Direct Line Speech to ensure that it receives full audio for correct S2T conversion.

DialogServiceConnector.ListenOnceAsync会监听,直到流关闭(或检测到足够的静音)为止.除了在您的using块末尾处理流之外,您不会关闭流.您可以等待ListenOnceAsync,但必须确保先关闭流.如果您不等待ListenOnceAsync,则可以随时关闭流,但是您可能应该在完成向流的写入后立即执行此操作,并且必须确保不丢弃流(或ListenOnceAsync之前的配置).

DialogServiceConnector.ListenOnceAsync will listen until the stream is closed (or enough silence is detected). You are not closing your stream except for when you dispose of it at the end of your using block. You could await ListenOnceAsync but you'd have to make sure you close the stream first. If you don't await ListenOnceAsync then you can close the stream whenever you want, but you should probably do it as soon as you finish writing to the stream and you have to make sure you don't dispose of the stream (or the config) before ListenOnceAsync has had a chance to complete.

您还希望确保ListenOnceAsync获得完整的语音提示.如果一次只接收4个字节,那肯定不是一个完整的话语.如果您希望将块保持为4个字节,那么最好使ListenOnceAsync在该循环的多次迭代中运行,而不是每获得4个字节就反复调用它.

You also want to make sure ListenOnceAsync gets the full utterance. If you're only receiving 4 bytes at a time then that's certainly not a full utterance. If you want to keep your chunks to 4 bytes then it may be a good idea to keep ListenOnceAsync running during multiple iterations of that loop rather than calling it over and over for every 4 bytes you get.

我知道DLS API支持ListenOnceAsync(),但不确定是否支持ASR(它知道另一侧的扬声器何时停止讲话)

I know DLS API supports ListenOnceAsync() but not sure if this supports ASR (it knows when the speaker on other side stopped talking)

我认为您必须确定发言人何时停止在客户端讲话,然后从WebSocket接收消息,指示您应该关闭ListenOnceAsync的音频流.

I think you will have to determine when the speaker stops talking on the client side and then receive a message from your WebSocket indicating that you should close the audio stream for ListenOnceAsync.

看起来ListenOnceAsync确实支持ASR.

It looks like ListenOnceAsync does support ASR.

我能否仅获取Direct Line Speech端点的websocket网址,并假设DLS正确使用了直接websocket流?

Can I just get the websocket url for the Direct Line Speech endpoint and assume DLS correctly consumes the direct websocket stream?

您可以尝试一下,但是我不会以为是我自己.直线语音仍在预览中,我不认为兼容性会变得容易.

You could try it, but I would not assume that myself. Direct Line Speech is still in preview and I don't expect compatibility to come easy.

这篇关于如何将实时音频流端点连接到直接语音端点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆