如何将数据数组导入到配置单元表的单独行中? [英] How do I import an array of data into separate rows in a hive table?

查看:65
本文介绍了如何将数据数组导入到配置单元表的单独行中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将以下格式的数据导入到配置单元表中

I am trying to import data in the following format into a hive table

[
    {
      "identifier" : "id#1",
      "dataA" : "dataA#1"
    },
    {
      "identifier" : "id#2",
      "dataA" : "dataA#2"
    }
]

我有多个这样的文件,我希望每个{}在表中形成一行.这是我尝试过的:

I have multiple files like this and I want each {} to form one row in the table. This is what I have tried:

CREATE EXTERNAL TABLE final_table(
    identifier STRING,
    dataA STRING
) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION "s3://bucket/path_in_bucket/"

尽管如此,这并未为每个{}创建一行.我也尝试过

This is not creating a single row for each {} though. I have also tried

CREATE EXTERNAL TABLE final_table(
    rows ARRAY< STRUCT<
    identifier: STRING,
    dataA: STRING
    >>
) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION "s3://bucket/path_in_bucket/"

但是这也不起作用.是否有某种方法可以将输入指定为数组,而每个记录都是配置单元查询数组中的一项?有什么建议吗?

but this is not work either. Is there some way of specifying that the input as an array with each record being an item in the array to the hive query? Any suggestions on what to do?

推荐答案

这就是您需要的

方法1:将名称添加到数组

数据

{"data":[{"identifier" : "id#1","dataA" : "dataA#1"},{"identifier" : "id#2","dataA" : "dataA#2"}]}

SQL

SET hive.support.sql11.reserved.keywords=false;

CREATE EXTERNAL TABLE IF NOT EXISTS ramesh_test (
  data array<
    struct<
      identifier:STRING, 
      dataA:STRING
    >
  >
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 'my_location';

SELECT rows.identifier,
       rows.dataA
  FROM ramesh_test d
LATERAL VIEW EXPLODE(d.data) d1 AS rows  ;

输出

方法2-数据无变化

数据

[{"identifier":"id#1","dataA":"dataA#1"},{"identifier":"id#2","dataA":"dataA#2"}]

SQL

CREATE EXTERNAL TABLE IF NOT EXISTS ramesh_raw_json (
  json STRING
)
LOCATION 'my_location';

SELECT get_json_object (exp.json_object, '$.identifier') AS Identifier,
       get_json_object (exp.json_object, '$.dataA') AS Identifier
  FROM ( SELECT json_object
           FROM ramesh_raw_json a
           LATERAL VIEW EXPLODE (split(regexp_replace(regexp_replace(a.json,'\\}\\,\\{','\\}\\;\\{'),'\\[|\\]',''), '\\;')) json_exploded AS json_object ) exp;

输出

这篇关于如何将数据数组导入到配置单元表的单独行中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆