java.io.IOException:从文件读取Avro时不是数据文件 [英] java.io.IOException: Not a data file while reading Avro from file

查看:63
本文介绍了java.io.IOException:从文件读取Avro时不是数据文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下代码用于序列化数据.

The following code is used to serialize the data.

        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
        BinaryEncoder binaryEncoder =
            EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null);

        DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<>(data.getSchema());
        datumWriter.setSchema(data.getSchema());
        datumWriter.write(data, binaryEncoder);

        binaryEncoder.flush();
        byteArrayOutputStream.close();

        result = byteArrayOutputStream.toByteArray();

我使用了以下命令

FileUtils.writeByteArrayToFile(new File("D:/sample.avro"), data);

将avro字节数组写入文件.但是当我尝试使用

to write avro byte array to a file. But when I try to read the same using

 File file = new File("D:/sample.avro");
        try {
          dataFileReader = new DataFileReader(file, datumReader);

        } catch (IOException exp) {
          System.out.println(exp);
          System.exit(1);
       }

它引发异常

java.io.IOException: Not a data file.
    at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
    at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
    at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:89)

这里发生了什么问题.我引用了另外两个类似的stackoverflow问题 this ,但对我没有太大帮助.有人可以帮助我理解这一点.

What is the problem happening here. I refered two other similar stackoverflow questions this and this but haven't been of much help to me. Can someone help me understand this.

推荐答案

实际数据以Avro二进制格式编码,但通常传递的不仅仅是编码数据.

The actual data is encoded in the Avro binary format, but typically what's passed around is more than just the encoded data.

大多数人对"avro文件"的看法是是一种格式,该格式包括标头(具有类似编写程序架构的内容),然后包含实际数据:

What most people think of an "avro file" is a format that includes the header (which has things like the writer schema) and then the actual data: https://avro.apache.org/docs/current/spec.html#Object+Container+Files. The first four bytes of an avro file should be b"Obj1" or 0x4F626A01. The error you are getting is because the binary you are trying to read as a data file doesn't start with the standard magic bytes.

另一种标准格式是单个对象编码: https://avro.apache.org/docs/current/spec.html#single_object_encoding .这种类型的二进制格式应以0xC301开头.

Another standard format is the single object encoding: https://avro.apache.org/docs/current/spec.html#single_object_encoding. This type of binary format should start with 0xC301.

但是,如果我不得不猜测,您所拥有的二进制文件可能只是原始的序列化数据,而没有任何类型的标头信息.尽管不知道如何创建字节数组就很难确定.

But if I had to guess, the binary you have could just be the raw serialized data without any sort of header information. Though it's hard to know for sure without knowing how the byte array that you have was created.

这篇关于java.io.IOException:从文件读取Avro时不是数据文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆