Json阵列的Avro模式 [英] Avro schema for Json array

查看:170
本文介绍了Json阵列的Avro模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有以下json:

Suppose I have following json:

[
   {"id":1,"text":"some text","user_id":1},
   {"id":1,"text":"some text","user_id":2},
   ...
]

对此对象数组合适的avro模式是什么?

What would be an appropriate avro schema for this array of objects?

推荐答案

[简短答案]
适用于该对象数组的avro模式如下所示:

[short answer]
The appropriate avro schema for this array of objects would look like:

const type = avro.Type.forSchema({
  type: 'array',
  items: { type: 'record', fields:
   [ { name: 'id', type: 'int' },
     { name: 'text', type: 'string' },
     { name: 'user_id', type: 'int' } ]
  }
});

[详细回答]
我们可以使用Avro通过给定的数据对象帮助我们构建上述架构.
让我们使用npm包" avsc ",它是"Avro规范的纯JavaScript实现".
由于Avro可以推断出值的模式,因此我们可以使用以下技巧来根据给定的数据获取模式(不幸的是,它似乎无法显示嵌套模式,但是我们可以询问两次-顶级结构(数组),然后数组元素):

[long answer]
We can use Avro to help us build the above schema by given data object.
Let's use npm package "avsc", which is "Pure JavaScript implementation of the Avro specification".
Since Avro can infer a value's schema we can use following trick to get schema by given data (unfortunately it seems can't show nested schemas, but we can ask twice - for top level structure (array) and then for array element):

// don't forget to install avsc
// npm install avsc
//
const avro = require('avsc');

// avro can infer a value's schema
const type = avro.Type.forValue([
   {"id":1,"text":"some text","user_id":1}
]);

const type2 = avro.Type.forValue(
   {"id":1,"text":"some text","user_id":1}
);


console.log(type.getSchema());
console.log(type2.getSchema());

输出:

{ type: 'array',
  items: { type: 'record', fields: [ [Object], [Object], [Object] ] } }
{ type: 'record',
  fields:
   [ { name: 'id', type: 'int' },
     { name: 'text', type: 'string' },
     { name: 'user_id', type: 'int' } ] }

现在让我们编写适当的架构,并尝试使用它来序列化对象,然后反序列化它!

Now let's compose proper schema and try to use it to serialize object and then de-serialize it back!

const avro = require('avsc');
const type = avro.Type.forSchema({
  type: 'array',
  items: { type: 'record', fields:
   [ { name: 'id', type: 'int' },
     { name: 'text', type: 'string' },
     { name: 'user_id', type: 'int' } ]
  }
});
const buf = type.toBuffer([
   {"id":1,"text":"some text","user_id":1},
   {"id":1,"text":"some text","user_id":2}]); // Encoded buffer.

const val = type.fromBuffer(buf);
console.log("deserialized object: ", JSON.stringify(val, null, 4));  // pretty print deserialized result

var fs = require('fs');
var full_filename = "/tmp/avro_buf.dat";
fs.writeFile(full_filename, buf, function(err) {
    if(err) {
        return console.log(err);
    }

    console.log("The file was saved to '" + full_filename + "'");
});

输出:

deserialized object:  [
    {
        "id": 1,
        "text": "some text",
        "user_id": 1
    },
    {
        "id": 1,
        "text": "some text",
        "user_id": 2
    }
]
The file was saved to '/tmp/avro_buf.dat'

我们甚至可以享受上述练习的紧凑二进制表示形式:

We can even enjoy the compact binary representation of the above exercise:

hexdump -C /tmp/avro_buf.dat
00000000  04 02 12 73 6f 6d 65 20  74 65 78 74 02 02 12 73  |...some text...s|
00000010  6f 6d 65 20 74 65 78 74  04 00                    |ome text..|
0000001a

很好,不是吗?-)

这篇关于Json阵列的Avro模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆