从嵌套JSON在Athena中创建表 [英] Create Table in Athena From Nested JSON

查看:81
本文介绍了从嵌套JSON在Athena中创建表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有嵌套的JSON类型

I have nested JSON of type

[{
    "emails": [{
        "label": "",
        "primary": "",
        "relationdef_id": "",
        "type": "",
        "value": ""
    }],
    "licenses": [{
        "allocated": "",
        "parent_type": "",
        "parentid": "",
        "product_type": "",
        "purchased_license_id": "",
        "service_type": ""
    }, {
        "allocated": "",
        "parent_type": "",
        "parentid": "",
        "product_type": "",
        "purchased_license_id": "",
        "service_type": ""
    }]
}, {
    "emails": [{
        "label": "",
        "primary": "",
        "relationdef_id": "",
        "type": "",
        "value": ""
    }],
    "licenses": [{
        "allocated": "2016-04-26 01:46:26",
        "parent_type": "",
        "parentid": "",
        "product_type": "",
        "purchased_license_id": "",
        "service_type": ""
    }]
}]

无法将其转换为雅典娜表.

which is not able to be converted to athena table.

我也尝试将其更新为对象列表

I have tried to update it to list of objects also

{
        "emails": [{
                "label": "",
                "primary": "",
                "relationdef_id": "",
                "type": "",
                "value": ""
            }
        ],
        "licenses": [{
                "allocated": "",
                "parent_type": "",
                "parentid": "",
                "product_type": "",
                "purchased_license_id": "",
                "service_type": ""
            },{
                "allocated": "",
                "parent_type": "",
                "parentid": "",
                "product_type": "",
                "purchased_license_id": "",
                "service_type": ""
            }
        ]
    }
    {
        "emails": [{
                "label": "",
                "primary": "",
                "relationdef_id": "",
                "type": "",
                "value": ""
            }
        ],
        "licenses": [{
                "allocated": "",
                "parent_type": "",
                "parentid": "",
                "product_type": "",
                "purchased_license_id": "",
                "service_type": ""
            }
        ]
    }
    {
        "emails": [{
                "label": "",
                "primary": "",
                "relationdef_id": "",
                "type": "",
                "value": ""
            }
        ],
        "licenses": [{
                "allocated": "",
                "parent_type": "",
                "parentid": "",
                "product_type": "",
                "purchased_license_id": "",
                "service_type": ""
            }
        ]
    }

带有查询:

CREATE EXTERNAL TABLE `test_orders1`(
  `emails` array<struct<`label`: string, `primary`: string,`relationdef_id`: string,`type`: string, `value`: string>>,
  `licenses` array<struct<`allocated`: string, `parent_type`: string, `parentid`: string, `product_type`: string,`purchased_license_id`: string, `service_type`: string>>) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'true')
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

,但仅形成1行.有没有一种方法可以在Athena表中使用JSONArray类型的嵌套json?或者我该如何更改对我有用的Nested Json?

but only 1 row is formed. Is there a way where i can use Nested json of type JSONArray into Athena table? Or how can I change Nested Json that will work for me?

推荐答案

查询JSON数据时,Athena要求使用每行一个JSON文档来格式化文件.从您的问题尚不清楚这是否是事实,您给出的示例是多行的,但这也许仅仅是为了使问题更清楚.

When querying JSON data Athena requires the files to be formatted with one JSON document per line. It's unclear from your question if this is the case or not, the examples you give are multiline, but perhaps that's only to make the question more clear.

您所包含的表DDL看起来应该可以在第二个示例数据上使用,但前提是它应格式化为每行一个文档,例如

The table DDL you include looks like it should work on the second example data, provided that it is formatted as one document per line, e.g.

{"emails": [{"label": "", "primary": "", "relationdef_id": "", "type": "", "value": ""}], "licenses": [{"allocated": "", "parent_type": "", "parentid": "", "product_type": "", "purchased_license_id": "", "service_type": ""}, { "allocated": "", "parent_type": "", "parentid": "", "product_type": "", "purchased_license_id": "", "service_type": ""}]}
{"emails": [{"label": "", "primary": "", "relationdef_id": "", "type": "", "value": ""}], "licenses": [{"allocated": "", "parent_type": "", "parentid": "", "product_type": "", "purchased_license_id": "", "service_type": ""}]}
{"emails": [{"label": "", "primary": "", "relationdef_id": "", "type": "", "value": ""}], "licenses": [{"allocated": "", "parent_type": "", "parentid": "", "product_type": "", "purchased_license_id": "", "service_type": ""}]}

这篇关于从嵌套JSON在Athena中创建表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆