SQL:爆炸数组 [英] SQL: Explode an array

查看:145
本文介绍了SQL:爆炸数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含JSON对象的表。每个JSON对象包含在方括号的数组,用逗号分隔。

I have a table that contains JSON objects. Each JSON object contains an array in square brackets, separated by commas.

我怎么可以访问任何元素的方括号数组中,例如马特,使用SQL?

How can I access any element in the square bracket array, for example "Matt", using SQL?

{"str":
   [
     1,
     134,
     61,
     "Matt",
     {"action.type":"registered","application":491,"value":423,"value2":12344},
     ["application"],
     [],
     "49:0"
   ]
}

我使用'蜂巢'ontop的Hadoop。如果你知道如何在SQL做到这一点,这是没有问题:)

I am using 'Hive' ontop of Hadoop. If you know how to do this in SQL, that is fine :)

推荐答案

您可以在蜂巢做法如下:

You can do this in Hive as follows:

首先你需要一个SERDE JSON(串行器/解串器)。我见过的最实用的是 https://github.com/rcongiu/Hive-JSON- SERDE / 。彼得Sankauskas SERDE将无法处理JSON这个复杂的它似乎。在撰写本文时,您将需要编译的Maven的SERDE并将JAR在您的蜂巢会话可以达到它。

First you need a JSON SerDe (Serializer / Deserializer). The most functional one I have seen is https://github.com/rcongiu/Hive-JSON-Serde/. The SerDe from Peter Sankauskas can't handle JSON this complex it seems. As of this writing you will need to compile the SerDe with Maven and place the JAR where your Hive session can reach it.

接下来,您将需要改变你的JSON格式。原因是蜂巢采用阵列的强类型的视图,所以混合整数和其他的东西都不行。考虑切换到这样的结构:

Next you are going to need to change your JSON format. The reason is Hive takes a strongly-typed view of arrays, so mixing integers and other things won't work. Consider switching to a struct like this:

{"str": { 
   n1 : 1,
   n2 : 134,
   n3 : 61,
   s1: "Matt",
   st1: {"type":"registered","app":491,"value":423,"value2":12344},
   ar1: ["application"],
   ar2: [],
   s2: "49:0"
} }

接下来,您将需要把JSON都是一个一行。我不知道这是否是蜂巢或SERDE的怪癖,但你需要做的。

Next you will need to put the JSON all one one line. I'm not sure if this is a quirk of Hive or the SerDe but you need to do it.

然后将数据复制到HDFS。

Then copy the data into HDFS.

现在您就可以定义一个表和查询远:

Now you're ready to define a table and query away:

ADD JAR /path/to/jar/json-serde-1.1.2-jar-with-dependencies.jar;
CREATE EXTERNAL TABLE json (
    str struct<
       n1 : int, n2 : int, n3 : int,
       s1 : string,
       st1 : struct < type : string, app : int, value : int, value2 : int>,
       ar1 : array<string>,
       ar2 : array<string>,
       s2 : string
    >
 )
 ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
 LOCATION '/hdfs/path/to/file';

有了这个地方,你可以运行有趣的嵌套查询:

With this in place you can run interesting nested queries like:

select str.st1.type from json;

最后但并非最不重要,因为这是如此的具体蜂巢这将是值得更新的问题和标签。

Last but not least since this is so specific to Hive it would be worthwhile to update the question and tags.

这篇关于SQL:爆炸数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆