GenericRecord的Avro架构:可以保留空白字段 [英] Avro Schema for GenericRecord: Be able to leave blank fields

查看:116
本文介绍了GenericRecord的Avro架构:可以保留空白字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Java将JSON转换为Avro,并使用Google DataFlow将其存储到GCS.Avro架构是在运行时使用SchemaBuilder创建的.

I'm using Java to convert JSON to Avro and store these to GCS using Google DataFlow. The Avro schema is created on runtime using SchemaBuilder.

我在模式中定义的字段之一是可选的LONG字段,它的定义如下:

One of the fields I define in the schema is an optional LONG field, it is defined like this:

SchemaBuilder.FieldAssembler<Schema> fields = SchemaBuilder.record(mainName).fields();
Schema concreteType = SchemaBuilder.nullable().longType();
fields.name("key1").type(concreteType).noDefault();

现在,当我使用上面的模式创建GenericRecord时,并且将生成的GenericRecord放到DoFn的上下文中时,未设置"key1": context.output(res); 以下错误:

Now when I create a GenericRecord using the schema above, and "key1" is not set, when putting the resulted GenericRecord to the context of my DoFn: context.output(res); I get the following error:

线程"main"中的异常org.apache.beam.sdk.Pipeline $ PipelineExecutionException:org.apache.avro.UnresolvedUnionException:不在联合中["long","null"]:256

Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: org.apache.avro.UnresolvedUnionException: Not in union ["long","null"]: 256

我还尝试使用 withDefault(0L)做同样的事情,并得到相同的结果.

I also tried doing the same thing with withDefault(0L) and got the same result.

我想念什么?谢谢

推荐答案

当我尝试以下操作时,它对我来说效果很好,您可以尝试打印有助于比较的架构,也可以删除长类型的nullable()尝试.

It works fine for me when trying as below and you can try to print the schema that will help to compare also you can remove the nullable() for long type to try.

fields.name("key1").type().nullable().longType().longDefault(0);

提供了我用来测试的完整代码:

Provided the complete code that I used to test:

import org.apache.avro.AvroRuntimeException;
import org.apache.avro.Schema;
import org.apache.avro.SchemaBuilder;
import org.apache.avro.SchemaBuilder.FieldAssembler;
import org.apache.avro.SchemaBuilder.RecordBuilder;
import org.apache.avro.file.DataFileReader;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.generic.GenericData.Record;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericDatumWriter;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.generic.GenericRecordBuilder;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DatumWriter;

import java.io.File;
import java.io.IOException;

public class GenericRecordExample {

  public static void main(String[] args) {

    FieldAssembler<Schema> fields;
    RecordBuilder<Schema> record = SchemaBuilder.record("Customer");
    fields = record.namespace("com.example").fields();
    fields = fields.name("first_name").type().nullable().stringType().noDefault();
    fields = fields.name("last_name").type().nullable().stringType().noDefault();
    fields = fields.name("account_number").type().nullable().longType().longDefault(0);

    Schema schema = fields.endRecord();
    System.out.println(schema.toString());

    // we build our first customer
    GenericRecordBuilder customerBuilder = new GenericRecordBuilder(schema);
    customerBuilder.set("first_name", "John");
    customerBuilder.set("last_name", "Doe");
    customerBuilder.set("account_number", 999333444111L);
    Record myCustomer = customerBuilder.build();
    System.out.println(myCustomer);

    // writing to a file
    final DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<>(schema);
    try (DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<>(datumWriter)) {
      dataFileWriter.create(myCustomer.getSchema(), new File("customer-generic.avro"));
      dataFileWriter.append(myCustomer);
      System.out.println("Written customer-generic.avro");
    } catch (IOException e) {
      System.out.println("Couldn't write file");
      e.printStackTrace();
    }

    // reading from a file
    final File file = new File("customer-generic.avro");
    final DatumReader<GenericRecord> datumReader = new GenericDatumReader<>();
    GenericRecord customerRead;
    try (DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(file, datumReader)){
      customerRead = dataFileReader.next();
      System.out.println("Successfully read avro file");
      System.out.println(customerRead.toString());

      // get the data from the generic record
      System.out.println("First name: " + customerRead.get("first_name"));

      // read a non existent field
      System.out.println("Non existent field: " + customerRead.get("not_here"));
    }
    catch(IOException e) {
      e.printStackTrace();
    }
  }
}

这篇关于GenericRecord的Avro架构:可以保留空白字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆