通过websockets进行音频流 [英] Audio streaming by websockets

查看:319
本文介绍了通过websockets进行音频流的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要创建语音聊天.我的后端服务器可以在Node.js上运行,并且客户端和服务器之间的几乎每个连接都使用socket.io.

I'm going to create voice chat. My backend server works on Node.js and almost every connection between client and server uses socket.io.

websocket是否适合我的用例?与P2P相比,我更喜欢通信客户端->服务器->客户端,因为我希望甚至有1000个客户端连接到一个房间.

Is websockets appropriate for my use case? I prefer communication client -> server -> clients than P2P because I expect even 1000 clients connected to one room.

如果websocket没问题,那么哪种方法最好将AudioBuffer发送到服务器并在其他客户端上回放?我是那样做的:

If websocket is ok, then which method is the best to send AudioBuffer to server and playback on other clients? I do it like that:

navigator.getUserMedia({audio: true}, initializeRecorder, errorCallback);
function initializeRecorder(MediaStream) {
    var audioCtx = new window.AudioContext();
    var sourceNode = audioCtx.createMediaStreamSource(MediaStream);

    var recorder = audioCtx.createScriptProcessor(4096, 1, 1);
    recorder.onaudioprocess = recorderProcess;

    sourceNode.connect(recorder);

    recorder.connect(audioCtx.destination);
}
function recorderProcess(e) {
    var left = e.inputBuffer.getChannelData(0);

    io.socket.post('url', left);
}

但是在其他客户端上接收到数据后,我不知道如何从缓冲区阵列中播放此音频流.

But after receive data on other clients I don't know how to playback this Audio Stream from Buffer Arrays.

编辑

1)为什么如果我不将ScriptProcessor(记录器变量)连接到目标,则不会触发onaudioprocess方法?

1) Why if I don't connect ScriptProcessor (recorder variable) to destination, onaudioprocess method isn't fired?

文档信息-尽管您只是想可视化一些音频数据,但不必提供目的地"-

Documentation info - "although you don't have to provide a destination if you, say, just want to visualise some audio data" - Web Audio concepts and usage

2)为什么将记录器变量连接到目标之后,为什么我的扬声器没有听到任何声音,如果我将sourceNode变量直接连接到目标,我可以听到.即使onaudioprocess方法没有任何作用.

2) Why I don't hear anything from my speakers after connect recorder variable to destination and if I connect sourceNode variable directly to destination, I do. Even if onaudioprocess method doesn't do anything.

任何人都可以帮忙吗?

推荐答案

我认为Web套接字在这里很合适.只要确保您正在使用二进制传输即可.(为此我自己使用 BinaryJS ,允许我向服务器打开任意流.)

I think web sockets are appropriate here. Just make sure that you are using binary transfer. (I use BinaryJS for this myself, allowing me to open up arbitrary streams to the server.)

从用户媒体捕获中获取数据非常简单.您所拥有的是一个良好的开端.棘手的聚会正在播放中.您将必须缓冲数据并使用您自己的脚本处理节点进行播放.

Getting the data from user media capture is pretty straightforward. What you have is a good start. The tricky party is on playback. You will have to buffer the data and play it back using your own script processing node.

如果您在所有地方都使用PCM,这并不难...从Web Audio API获得的原始样本.这样做的缺点是,将32位浮点PCM浪费在周围有很多开销.这样会占用大量带宽,而这是单独语音所不需要的.

This isn't too hard if you use PCM everywhere... the raw samples you get from the Web Audio API. The downside of this is that there is a lot of overhead shoving 32-bit floating point PCM around. This uses a ton of bandwidth which isn't needed for speech alone.

我认为,在您的情况下,最简单的方法是将位深度减小到适合您的应用程序的任意位深度.8位样本足以识别语音,并且占用的带宽要少得多.通过使用PCM,您不必在JS中实现编解码器,而不必处理该编解码器的数据缓冲和成帧.

I think the easiest thing to do in your case is to reduce the bit depth to an arbitrary bit depth that works well for your application. 8-bit samples are plenty for discernible speech and will take up quite a bit less bandwidth. By using PCM, you avoid having to implement a codec in JS and then having to deal with the buffering and framing of data for that codec.

总而言之,一旦在脚本处理节点中将原始样本数据存储在类型化数组中,就编写一些内容将这些样本从32位浮点型转换为8位有符号整数.通过二进制Web套接字,以与缓冲区相同的大小将这些缓冲区发送到您的服务器.然后,服务器会将它们发送到其二进制Web套接字上的所有其他客户端.当客户端收到音频数据时,它将在您选择的任何时间量内缓冲数据,以防止音频丢失.您的客户端代码会将这些8位样本转换回32位浮点数,并将其放入播放缓冲区中.您的脚本处理节点将提取缓冲区中的所有内容,并在数据可用时开始播放.

To summarize, once you have the raw sample data in a typed array in your script processing node, write something to convert those samples from 32-bit float to 8-bit signed integers. Send these buffers to your server in the same size chunks as they come in on, over your binary web socket. The server will then send these to all the other clients on their binary web sockets. When the clients receive audio data, it will buffer it for whatever amount of time you choose to prevent dropping audio. Your client code will convert those 8-bit samples back to 32-bit float and put it in a playback buffer. Your script processing node will pick up whatever is in the buffer and start playback as data is available.

这篇关于通过websockets进行音频流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆