Hive Serde处理嵌套结构的问题 [英] issue with Hive Serde dealing nested structs

查看:419
本文介绍了Hive Serde处理嵌套结构的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图用嵌套结构加载大量的json数据以使用Json serde配置单元。一些字段名以嵌套结构中的 $ 开始。我正在使用 SerDeproperties 映射配置单元提交的名称,但是当我查询表时,在以 $ $ b

示例JSON:

  {
_id:319FFE15FF90,
SomeThing:
{
$ SomeField:22,
AnotherField:2112,
YetAnotherField:1
}
。 。 。等等。 。 。 。

使用以下模式:

<$ p $ b $ create table testSample

`_id` string,
struct struct
<
$ somefield:int,
anotherfield:bigint,
yetanotherfield:int
>

行格式serde'org.openx.data.jsonserde.JsonSerDe'
with serdeproperties

mapping.somefield=$ somefield
);

这种模式构建成OK,但是,某个字段(以 $ )总是返回null(所有其他值都存在且正确)。



我们一直在尝试很多语法组合,但无济于事。

有谁知道在名称中使用前导 $ 的嵌套字段的技巧?

解决方案

你几乎是对的。尝试像这样创建表。
您所犯的错误是,当在serde属性(mapping.somefield =$ somefield)中进行映射时,您会说当查找名为'somefield'的配置单元列时,请查找json字段' $ somefield',但是在蜂房中,您使用美元符号定义了列,如果不是完全非法的,那肯定不是蜂巢中的最佳实践。

 < 






$ b $ some




$ field
$ fieldfield:bigint,
yetanotherfield:int
>

行格式serde'org.openx.data.jsonserde.JsonSerDe'
with serdeproperties

mapping.somefield=$ somefield
);

我测试了它一些测试数据:

  {_id:123,something:{$ somefield:12, anotherfield:13,yetanotherfield:100}} 
hive>从testSample中选择something.somefield;
OK
12


I am trying to load a huge volume json data with nested structure to hive using a Json serde. some of the field names start with $ in nested structure. I am mapping hive filed names Using SerDeproperties, but how ever when i query the table, getting null in the field starting with $, tried with different syntax,but no luck.

Sample JSON:

{
    "_id" : "319FFE15FF90",
    "SomeThing" : 
    {
            "$SomeField"     : 22,
            "AnotherField"   : 2112,
            "YetAnotherField":    1
    }
 . . . etc . . . .

Using a schema as follows:

create table testSample
( 
    `_id` string, 
    something struct
    <
        $somefield:int,
        anotherfield:bigint, 
        yetanotherfield:int
    >
) 
row format serde 'org.openx.data.jsonserde.JsonSerDe' 
with serdeproperties
(
    "mapping.somefield" = "$somefield"
);

This schema builds OK, however, somefield(starting with $) in the above table is always returning null (all the other values exist and are correct).

We've been trying a lot of syntax combinations, but to no avail.

Does anyone know the trick to hap a nested field with a leading $ in its name?

解决方案

You almost got it right. Try creating the table like this. The mistake you're making is that when mapping in the serde properties (mapping.somefield ="$somefield") you're saying "when looking for the hive column named 'somefield', look for the json field '$somefield', but in hive you defined the column with the dollar sign, which if not outright illegal it's for sure not the best practice in hive.

create table testSample
(
`_id` string,
something struct
<
    somefield:int,
    anotherfield:bigint,
    yetanotherfield:int
  >
)
row format serde 'org.openx.data.jsonserde.JsonSerDe'
with serdeproperties
(
"mapping.somefield" = "$somefield"
);

I tested it with some test data:

{ "_id" : "123", "something": { "$somefield": 12, "anotherfield":13,"yetanotherfield":100}}
hive> select something.somefield from testSample;
OK
12

这篇关于Hive Serde处理嵌套结构的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆