AWS Transcribe Streaming BadRequestException:“无法解码音频流..." [英] AWS Transcribe Streaming BadRequestException: "Could not decode the audio stream..."

查看:34
本文介绍了AWS Transcribe Streaming BadRequestException:“无法解码音频流..."的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 websockets 在 Dart/Flutter 中构建一个 Transcribe Streaming 应用程序.当我流式传输测试音频(从单声道、16kHz、16 位签名小端 WAV 文件中提取)时,我得到...

I'm building a Transcribe Streaming app in Dart/Flutter with websockets. When I stream the test audio (pulled from a mono, 16kHz, 16bit signed little endian WAV file), I get...

BadRequestException:无法解码您提供的音频流.检查音频流是否有效,然后重试您的请求.

BadRequestException: Could not decode the audio stream that you provided. Check that the audio stream is valid and try your request again.

作为测试,我使用一个文件来传输音频.我每秒发送 32k 数据字节(大致模拟实时麦克风流).如果我流式传输所有 0x00 或所有 0xFF 或随机字节,我什至会收到错误消息.如果我将块大小划分为 16k,将间隔时间划分为 0.5 秒,那么它会在出错之前再增加一帧......

As a test I'm using a file to stream the audio. I'm sending 32k data bytes every second (roughly simulating a realtime microphone stream). I even get the error if I stream all 0x00 or all 0xFF or random bytes. If I divide the chunk size to 16k and the interval time to 0.5s then it goes one more frame before erroring out...

就数据而言,我只是将 EventStream 帧的数据部分中的字节按字面意思打包在文件中.显然 Event Stream 打包是正确的(字节布局、CRC),否则我会收到一个错误指示,不是吗?

As far as the data, I'm simply packing the bytes in the data portion of the EventStream frame literally as they are in the file. Clearly the Event Stream packaging is correct (the byte layout, the CRCs) or else I'd get an error indicating that, no?

什么会向 AWSTrans 表明它不可解码?关于如何进行此操作还有其他想法吗?

What would indicate to AWSTrans that it is not decodable? Any other ideas on how to proceed with this?

感谢您的帮助...

这是进行打包的代码.完整版在这里(如果你敢......目前有点混乱)https://pastebin.com/PKTj5xM2

Here's the code that does the packing. Full version is here (if you dare...It's a bit of a mess at the moment) https://pastebin.com/PKTj5xM2

Uint8List createEventStreamFrame(Uint8List audioChunk) {
  final headers = [
    EventStreamHeader(":content-type", 7, "application/octet-stream"),
    EventStreamHeader(":event-type", 7, "AudioEvent"),
    EventStreamHeader(":message-type", 7, "event")
  ];
  final headersData = encodeEventStreamHeaders(headers);
 
  final int totalLength = 16 + audioChunk.lengthInBytes + headersData.lengthInBytes;
  // final prelude = [headersData.length, totalLength];
  // print("Prelude: " + prelude.toString());
 
  // Convert a 32b int to 4 bytes
  List<int> int32ToBytes(int i) { return [(0xFF000000 & i) >> 24, (0x00FF0000 & i) >> 16, (0x0000FF00 & i) >> 8, (0x000000FF & i)]; }
 
  final audioBytes = ByteData.sublistView(audioChunk);
  var offset = 0;
  var audioDataList = <int>[];
  while (offset < audioBytes.lengthInBytes) {
    audioDataList.add(audioBytes.getInt16(offset, Endian.little));
    offset += 2;
  }
 
  final crc = CRC.crc32();
  final messageBldr = BytesBuilder();
  messageBldr.add(int32ToBytes(totalLength));
  messageBldr.add(int32ToBytes(headersData.length));
 
  // Now we can calc the CRC. We need to do it on the bytes, not the Ints
  final preludeCrc = crc.calculate(messageBldr.toBytes());
 
  // Continue adding data
  messageBldr.add(int32ToBytes(preludeCrc));
  messageBldr.add(headersData.toList());
  // messageBldr.add(audioChunk.toList());
  messageBldr.add(audioDataList);
  final messageCrc = crc.calculate(messageBldr.toBytes().toList());
  messageBldr.add(int32ToBytes(messageCrc));
  final frame = messageBldr.toBytes();
  //print("${frame.length} == $totalLength");
  return frame;
}

推荐答案

BadRequestException,至少在我的情况下,指的是帧编码不正确,而不是音频数据错误.

BadRequestException, at least in my case, refered to having the frame encoded incorrectly rather than the audio data being wrong.

AWS 事件流编码详细信息位于此处.

AWS Event Stream Encoding details are here.

我在字节序和字节大小方面遇到了一些问题.您需要对消息编码和音频缓冲区非常了解.音频需要是 16bit/signed (int)/little-endian (见这里).消息包装器中的那些长度参数是 32 位(4 字节)大端.ByteData 是您在 Dart 中的朋友.这是我更新后的代码中的一个片段:

I had some issues with endianness and bytesize. You need to be very bit-saavy with the message encoding and the audio buffer. The audio needs to be 16bit/signed (int)/little-endian (See here). And those length params in the message wrapper are 32bit (4 bytes) BIG endian. ByteData is your friend here in Dart. Here's a snippet from my updated code:

final messageBytes = ByteData(totalLength);

...

for (var i=0; i<audioChunk.length; i++) {
  messageBytes.setInt16(offset, audioChunk[i], Endian.little);
  offset += 2;
}

请注意,16 位 int 实际上占用了 2 个 字节 位置.如果您不指定 Endian 样式,那么它将默认为您的系统,这将导致标题 int 编码或音频数据出错...丢失丢失!

Notice that the 16bit int is actually taking up 2 bytes positions. If you don't specify the Endian style then it will default to your systems which will get it wrong either for the header int encoding or the audio data...lose lose!

确保一切正确的最佳方法是编写您的解码函数,无论如何您都需要这些函数用于 AWS 响应,然后解码您的编码帧并查看结果是否相同.使用 [-32000, -100, 0, 200 31000] 之类的 audo 测试数据或类似的东西,以便您可以测试字节顺序等都是正确的.

The best way to go about ensuring it is all correct is to write your decode functions which you'll need for the AWS response anyway and then decode your encoded frame and see if it comes out the same. Use test data for the audo like [-32000, -100, 0, 200 31000] or something like that so you can test the endianness, etc. is all correct.

这篇关于AWS Transcribe Streaming BadRequestException:“无法解码音频流..."的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆