如何通过 websocket 将音频发送到 Nexmo Voice [英] How can I send audio to Nexmo Voice through websocket

查看:63
本文介绍了如何通过 websocket 将音频发送到 Nexmo Voice的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在 .Net Core 2 web api 中使用 websockets 实现 Nexmo 的 Voice api.

I am trying to implement Nexmo's Voice api, with websockets, in a .Net Core 2 web api.

  • 通过 Nexmo 接收电话中的音频>
  • 使用 Microsoft Cognitive 语音转文字 api
  • 将文本发送给机器人
  • 使用 Microsoft Cognitive 文本到语音 机器人的回复
  • 通过他们的语音 api websocket 将语音发回给 nexmo
  • receive audio from phone call, through Nexmo
  • use Microsoft Cognitive Speech to text api
  • send the text to a bot
  • use Microsoft Cognitive text to speech on the reply of the bot
  • send back the speech to nexmo, through their voice api websocket

现在,我正在绕过机器人步骤,因为我首先尝试连接到 websocket.尝试使用回声方法(将收到的音频发送回 websocket)时,它可以正常工作.但是当我尝试将语音从 Microsoft 文本发送到语音时,电话结束了.

For now, I'm bypassing the bot steps, as I am first trying to connect to the websocket. When trying an echo method (send back to the websocket the audio received), it works without any issue. But when I try to send the speech from Microsoft text to speech, the phone call ends.

我没有找到任何实现与回声不同的东西的文档.

I am not finding any documentation implementing something different than just an echo.

TextToSpeech 和 SpeechToText 方法在 websocket 外部使用时按预期工作.

The TextToSpeech and SpeechToText methods work as expected when used outside of the websocket.

这是带有语音到文本的 websocket:

Here's the websocket with the speech-to-text :

public static async Task Echo(HttpContext context, WebSocket webSocket)
    {
        var buffer = new byte[1024 * 4];
        WebSocketReceiveResult result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);
        while (!result.CloseStatus.HasValue)
        {
            while(!result.EndOfMessage)
            {
                result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);
            }
            var text = SpeechToText.RecognizeSpeechFromBytesAsync(buffer).Result;
            Console.WriteLine(text);
        }
        await webSocket.CloseAsync(result.CloseStatus.Value, result.CloseStatusDescription, CancellationToken.None);
    }

这是带有文本到语音的 websocket :

And here's the websocket with the text-to-speech :

public static async Task Echo(HttpContext context, WebSocket webSocket)
    {
        var buffer = new byte[1024 * 4];
        WebSocketReceiveResult result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);
        while (!result.CloseStatus.HasValue)
        {
            var ttsAudio = await TextToSpeech.TransformTextToSpeechAsync("Hello, this is a test", "en-US");
            await webSocket.SendAsync(new ArraySegment<byte>(ttsAudio, 0, ttsAudio.Length), WebSocketMessageType.Binary, true, CancellationToken.None);

            result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);
        }
        await webSocket.CloseAsync(result.CloseStatus.Value, result.CloseStatusDescription, CancellationToken.None);
    }


2019 年 3 月 1 日更新

回复Sam Machin 的评论我尝试将数组拆分为每个 640 字节的块(我使用的是 16000khz 采样率),但是 nexmo 仍然挂断了电话,我仍然没有听到任何声音.

in reply to Sam Machin's comment I tried splitting the array into chunks of 640 bytes each (I'm using 16000khz sample rate), but nexmo still hangs up the call, and I still don't hear anything.

public static async Task NexmoTextToSpeech(HttpContext context, WebSocket webSocket)
    {
        var ttsAudio = await TextToSpeech.TransformTextToSpeechAsync("This is a test", "en-US");
        var buffer = new byte[1024 * 4];
        WebSocketReceiveResult result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);

        while (!result.CloseStatus.HasValue)
        {
            await SendSpeech(context, webSocket, ttsAudio);
            result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);
        }
        await webSocket.CloseAsync(WebSocketCloseStatus.NormalClosure, "Closing Socket", CancellationToken.None);
    }

    private static async Task SendSpeech(HttpContext context, WebSocket webSocket, byte[] ttsAudio)
    {
        const int chunkSize = 640;
        var chunkCount = 1;
        var offset = 0;
        
        var lastFullChunck = ttsAudio.Length < (offset + chunkSize);
        try
        {
            while(!lastFullChunck)
            {
                await webSocket.SendAsync(new ArraySegment<byte>(ttsAudio, offset, chunkSize), WebSocketMessageType.Binary, false, CancellationToken.None);
                offset = chunkSize * chunkCount;
                lastFullChunck = ttsAudio.Length < (offset + chunkSize);
                chunkCount++;
            }

            var lastMessageSize = ttsAudio.Length - offset;
            await webSocket.SendAsync(new ArraySegment<byte>(ttsAudio, offset, lastMessageSize), WebSocketMessageType.Binary, true, CancellationToken.None);
        }
        catch (Exception ex)
        {
        }
    }

这是有时出现在日志中的异常:

Here's the exception that sometimes appears in the logs :

System.Net.WebSockets.WebSocketException (0x80004005):远程一方在没有完成关闭的情况下关闭了 WebSocket 连接握手.

System.Net.WebSockets.WebSocketException (0x80004005): The remote party closed the WebSocket connection without completing the close handshake.

推荐答案

看起来你正在将整个音频剪辑写入 websocket,Nexmo 接口要求音频在 20ms 帧内每条消息,这意味着您需要将剪辑分成 320 或 640 字节(取决于您使用的是 8Khz 还是 16Khz)块并将每个块写入套接字.如果您尝试将太大的文件写入套接字,它会像您看到的那样关闭.

It looks like you're writing the whole audio clip to the websocket, the Nexmo interface requires the audio to be in 20ms frames one per message, this means that you need to break your clip up into 320 or 640 byte (depending on if you're using 8Khz or 16Khz) chunks and write each one to the socket. If you try and write too larger file to the socket it will close as you are seeing.

参见 https://developer.nexmo.com/voice/voice-api/guides/websockets#writing-audio-to-the-websocket 了解详情.

这篇关于如何通过 websocket 将音频发送到 Nexmo Voice的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆