Avro序列化JSON文档缺少字段的麻烦 [英] Trouble with Avro serialization of json documents missing fields
问题描述
我试图使用Apache Avro强制将从Elastic Search导出的数据模式转换为HDFS中的很多Avro文档(用Drill查询)。
我在Avro默认设置中遇到了一些麻烦
鉴于以下模式:
{
namespace:avrotest,
type:record,
name:people,
fields :[
{name:firstname,type:string},
{name:age,type:int,default 1}
]
}
我期望这样的json文档作为 {firstname:Jane}
将被序列化为默认值
-1
$ b
default:此字段的默认值,用于读取缺少此字段的实例
时使用(可选)。
然而,这似乎并没有发生。
java -jar avro-tools-1.8.0.jar fromjson --schema-file p2.avsc jane.json> jane.avro
线程main中的异常org.apache.avro.AvroTypeException:预期的int。在org.apache.avro.io.JsonDecoder.error处获得END_OBJECT
(JsonDecoder.java:697)$ or $ $ $ $ $在org.apache.avro.io.JsonDecoder.readInt(JsonDecoder.java:172)
在org.apache.avro.io.ValidatingDecoder.readInt(ValidatingDecoder.java:83)
在org.apache.avro.generic.GenericDatumReader.readInt(GenericDatumReader.java:511)
在org .apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:182)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
at org.apache.avro .generic.GenericDatumReader.readField(GenericDatumReader.java:240)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:230)
at org.apache.avro.generic.GenericDatumReader .readWithoutConversion(GenericDatumReader.java:174)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader的.java:14 4)
在org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:99)
在org.apache.avro.tool.Main.run(Main.java:87)
at org.apache.avro.tool.Main.main(Main.java:76)
这是可能的,还是我错过了什么?
在此先感谢
$ p
$ b
{name:fieldName ,type:[int,null],default:null}
足够使用像可选的字段,尝试像这样声明:
{name:fieldName,type :[null,int],默认值:null}
I'm trying to use Apache Avro to enforce a schema on data exported from Elastic Search into a lot of Avro documents in HDFS (to be queried with Drill). I'm having some trouble with Avro defaults
Given this schema:
{
"namespace" : "avrotest",
"type" : "record",
"name" : "people",
"fields" : [
{"name" : "firstname", "type" : "string"},
{"name" : "age", "type" :"int", "default": -1}
]
}
I'd expect that a json document such as {"firstname" : "Jane"}
would be serialized using the default value of -1
for the age field.
default: A default value for this field, used when reading instances that lack this field (optional).
However, this doesn't seem to happen
java -jar avro-tools-1.8.0.jar fromjson --schema-file p2.avsc jane.json > jane.avro
Exception in thread "main" org.apache.avro.AvroTypeException: Expected int. Got END_OBJECT
at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
at org.apache.avro.io.JsonDecoder.readInt(JsonDecoder.java:172)
at org.apache.avro.io.ValidatingDecoder.readInt(ValidatingDecoder.java:83)
at org.apache.avro.generic.GenericDatumReader.readInt(GenericDatumReader.java:511)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:182)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:240)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:230)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:174)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:99)
at org.apache.avro.tool.Main.run(Main.java:87)
at org.apache.avro.tool.Main.main(Main.java:76)
Is this possible, or am I missing something ?
Thanks in advance
The point is, if you declare your field in the schema like this:
{"name": "fieldName", "type": ["int", "null"], default: null }
It's not enough to use a field like optional, try declaring it like this:
{"name": "fieldName", "type": ["null", "int"], default: null }
这篇关于Avro序列化JSON文档缺少字段的麻烦的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!