如何通过 websocket 将音频发送到 Nexmo Voice [英] How can I send audio to Nexmo Voice through websocket

查看：63 发布时间：2021/9/6 19:47:56 c# websocket speech-recognition text-to-speech nexmo

本文介绍了如何通过 websocket 将音频发送到 Nexmo Voice的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试在 .Net Core 2 web api 中使用 websockets 实现 Nexmo 的 Voice api.

I am trying to implement Nexmo's Voice api, with websockets, in a .Net Core 2 web api.

通过 Nexmo 接收电话中的音频>
使用 Microsoft Cognitive 语音转文字 api
将文本发送给机器人
使用 Microsoft Cognitive 文本到语音机器人的回复
通过他们的语音 api websocket 将语音发回给 nexmo

receive audio from phone call, through Nexmo
use Microsoft Cognitive Speech to text api
send the text to a bot
use Microsoft Cognitive text to speech on the reply of the bot
send back the speech to nexmo, through their voice api websocket

现在，我正在绕过机器人步骤，因为我首先尝试连接到 websocket.尝试使用回声方法(将收到的音频发送回 websocket)时，它可以正常工作.但是当我尝试将语音从 Microsoft 文本发送到语音时，电话结束了.

For now, I'm bypassing the bot steps, as I am first trying to connect to the websocket. When trying an echo method (send back to the websocket the audio received), it works without any issue. But when I try to send the speech from Microsoft text to speech, the phone call ends.

我没有找到任何实现与回声不同的东西的文档.

I am not finding any documentation implementing something different than just an echo.

TextToSpeech 和 SpeechToText 方法在 websocket 外部使用时按预期工作.

The TextToSpeech and SpeechToText methods work as expected when used outside of the websocket.

这是带有语音到文本的 websocket:

Here's the websocket with the speech-to-text :

public static async Task Echo(HttpContext context, WebSocket webSocket)
    {
        var buffer = new byte[1024 * 4];
        WebSocketReceiveResult result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);
        while (!result.CloseStatus.HasValue)
        {
            while(!result.EndOfMessage)
            {
                result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);
            }
            var text = SpeechToText.RecognizeSpeechFromBytesAsync(buffer).Result;
            Console.WriteLine(text);
        }
        await webSocket.CloseAsync(result.CloseStatus.Value, result.CloseStatusDescription, CancellationToken.None);
    }

这是带有文本到语音的 websocket :

And here's the websocket with the text-to-speech :

public static async Task Echo(HttpContext context, WebSocket webSocket)
    {
        var buffer = new byte[1024 * 4];
        WebSocketReceiveResult result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);
        while (!result.CloseStatus.HasValue)
        {
            var ttsAudio = await TextToSpeech.TransformTextToSpeechAsync("Hello, this is a test", "en-US");
            await webSocket.SendAsync(new ArraySegment<byte>(ttsAudio, 0, ttsAudio.Length), WebSocketMessageType.Binary, true, CancellationToken.None);

            result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);
        }
        await webSocket.CloseAsync(result.CloseStatus.Value, result.CloseStatusDescription, CancellationToken.None);
    }

2019 年 3 月 1 日更新

回复Sam Machin 的评论我尝试将数组拆分为每个 640 字节的块(我使用的是 16000khz 采样率)，但是 nexmo 仍然挂断了电话，我仍然没有听到任何声音.

in reply to Sam Machin's comment I tried splitting the array into chunks of 640 bytes each (I'm using 16000khz sample rate), but nexmo still hangs up the call, and I still don't hear anything.

public static async Task NexmoTextToSpeech(HttpContext context, WebSocket webSocket)
    {
        var ttsAudio = await TextToSpeech.TransformTextToSpeechAsync("This is a test", "en-US");
        var buffer = new byte[1024 * 4];
        WebSocketReceiveResult result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);

        while (!result.CloseStatus.HasValue)
        {
            await SendSpeech(context, webSocket, ttsAudio);
            result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);
        }
        await webSocket.CloseAsync(WebSocketCloseStatus.NormalClosure, "Closing Socket", CancellationToken.None);
    }

    private static async Task SendSpeech(HttpContext context, WebSocket webSocket, byte[] ttsAudio)
    {
        const int chunkSize = 640;
        var chunkCount = 1;
        var offset = 0;
        
        var lastFullChunck = ttsAudio.Length < (offset + chunkSize);
        try
        {
            while(!lastFullChunck)
            {
                await webSocket.SendAsync(new ArraySegment<byte>(ttsAudio, offset, chunkSize), WebSocketMessageType.Binary, false, CancellationToken.None);
                offset = chunkSize * chunkCount;
                lastFullChunck = ttsAudio.Length < (offset + chunkSize);
                chunkCount++;
            }

            var lastMessageSize = ttsAudio.Length - offset;
            await webSocket.SendAsync(new ArraySegment<byte>(ttsAudio, offset, lastMessageSize), WebSocketMessageType.Binary, true, CancellationToken.None);
        }
        catch (Exception ex)
        {
        }
    }

这是有时出现在日志中的异常:

Here's the exception that sometimes appears in the logs :

System.Net.WebSockets.WebSocketException (0x80004005):远程一方在没有完成关闭的情况下关闭了 WebSocket 连接握手.

System.Net.WebSockets.WebSocketException (0x80004005): The remote party closed the WebSocket connection without completing the close handshake.

如何通过 websocket 将音频发送到 Nexmo Voice [英] How can I send audio to Nexmo Voice through websocket

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

如何通过 websocket 将音频发送到 Nexmo Voice [英] How can I send audio to Nexmo Voice through websocket

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭