如何将数据添加到avro模式中的可重复字段? [英] how to add data to the repeatable fields in avro schema?

查看:126
本文介绍了如何将数据添加到avro模式中的可重复字段?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在不生成代码的情况下测试avro serde和deserde(我使用代码生成完成了此任务).架构如下

I'm trying to test avro serde and deserde without code generation (I completed this task using code generation). Schema is as follows

{
"type": "record",
"name" : "person",
"namespace" : "avro",
"fields": [
    { "name" : "personname", "type": ["null","string"] },
    { "name" : "personId", "type": ["null","string"] },
    {  "name" : "Addresses", "type": {
        "type": "array",
        "items": [  {
          "type" : "record",
          "name" : "Address",
          "fields" : [
            { "name" : "addressLine1", "type": ["null", "string"] },
            { "name" : "addressLine2", "type": ["null", "string"] },
            { "name" : "city", "type": ["null", "string"] },
            { "name" : "state", "type": ["null", "string"] },
            { "name" : "zipcode", "type": ["null", "string"] }
            ]
        }]
        }
    },
    { "name" : "contact", "type" : ["null", "string"]}
]
}

我了解这是将数据添加到架构的方式.

I understand this is how data is added to the schema.

Schema schema = new Schema.Parser().parse(new File("src/person.avsc.txt"));
GenericRecord person1 = new GenericData.Record(schema);
person1.put("personname", "goud");

但是如何将城市,州等添加到地址,然后再将其添加到地址?

But how do I add city, state etc to address and then add it to addresses?

GenericRecord address1 = new GenericData.Record(schema);
address1.put("city", "SanJose");

以上代码段无效.我试图研究GenericArray,但无法解决.

The above snippet doesn't work. I tried to look into GenericArray, but I couldn't get my head around it.

推荐答案

您需要在单独的架构中描述内部复杂类型("type":"record","name":"Address"),如下所示:

You need to describe inner complex type ("type" : "record", "name" : "Address") in separate schema, like this:

{
  "type" : "record",
  "name" : "Address",
  "fields" : [
    { "name" : "addressLine1", "type": ["null", "string"] },
    { "name" : "addressLine2", "type": ["null", "string"] },
    { "name" : "city", "type": ["null", "string"] },
    { "name" : "state", "type": ["null", "string"] },
    { "name" : "zipcode", "type": ["null", "string"] }
  ]
}

然后您可以创建一个内部对象:

Then you may create an inner object:

Schema innerSchema = new Schema.Parser().parse(new File("person_address.avsc"));
GenericRecord address = new GenericData.Record(innerSchema);
address.put("addressLine1", "adr_1");
address.put("addressLine2", "adr_2");
address.put("city", "test_city");
address.put("state", "test_state");
address.put("zipcode", "zipcode_00000");

然后将您创建的内部对象添加到ArrayList.

Then add an inner object you created to ArrayList.

最后,创建主对象并将其添加到其中.

At last, create the main object and add all this staff in it.

以下是Java中的完整示例:

Here is full example in java:

Schema innerSchema = new Schema.Parser().parse(new File("person_address.avsc"));
GenericRecord address = new GenericData.Record(innerSchema);
address.put("addressLine1", "adr_1");
address.put("addressLine2", "adr_2");
address.put("city", "test_city");
address.put("state", "test_state");
address.put("zipcode", "zipcode_00000");

ArrayList<GenericRecord> addresses = new ArrayList<>();
addresses.add(address);

Schema mainSchema = new Schema.Parser().parse(new File("person.avsc"));
GenericRecord person1 = new GenericData.Record(mainSchema);
person1.put("personname", "goud");
person1.put("personId", "123_id");
person1.put("Addresses", addresses);

结果:

{
  "personname": "goud",
  "personId": "123_id",
  "Addresses": [
    {
      "addressLine1": "adr_1",
      "addressLine2": "adr_2",
      "city": "test_city",
      "state": "test_state",
      "zipcode": "zipcode_00000"
    }
  ],
  "contact": "test_contact"
}

这篇关于如何将数据添加到avro模式中的可重复字段?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆