SQL:分解数组 [英] SQL: Explode an array

查看:29
本文介绍了SQL:分解数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含 JSON 对象的表.每个 JSON 对象在方括号中包含一个数组,以逗号分隔.

I have a table that contains JSON objects. Each JSON object contains an array in square brackets, separated by commas.

如何使用 SQL 访问方括号数组中的任何元素,例如Matt"?

How can I access any element in the square bracket array, for example "Matt", using SQL?

{"str":
   [
     1,
     134,
     61,
     "Matt",
     {"action.type":"registered","application":491,"value":423,"value2":12344},
     ["application"],
     [],
     "49:0"
   ]
}

我在 Hadoop 上使用Hive".如果您知道如何在 SQL 中执行此操作,那很好:)

I am using 'Hive' ontop of Hadoop. If you know how to do this in SQL, that is fine :)

推荐答案

您可以在 Hive 中执行以下操作:

You can do this in Hive as follows:

首先你需要一个 JSON SerDe(串行器/解串器).我见过的最实用的是 https://github.com/rcongiu/Hive-JSON-塞尔德/.Peter Sankauskas 的 SerDe 似乎无法处理如此复杂的 JSON.在撰写本文时,您需要使用 Maven 编译 SerDe,并将 JAR 放置在 Hive 会话可以访问它的位置.

First you need a JSON SerDe (Serializer / Deserializer). The most functional one I have seen is https://github.com/rcongiu/Hive-JSON-Serde/. The SerDe from Peter Sankauskas can't handle JSON this complex it seems. As of this writing you will need to compile the SerDe with Maven and place the JAR where your Hive session can reach it.

接下来您需要更改您的 JSON 格式.原因是 Hive 对数组采用强类型视图,因此混合整数和其他内容将不起作用.考虑切换到这样的结构:

Next you are going to need to change your JSON format. The reason is Hive takes a strongly-typed view of arrays, so mixing integers and other things won't work. Consider switching to a struct like this:

{"str": { 
   n1 : 1,
   n2 : 134,
   n3 : 61,
   s1: "Matt",
   st1: {"type":"registered","app":491,"value":423,"value2":12344},
   ar1: ["application"],
   ar2: [],
   s2: "49:0"
} }

接下来,您需要将 JSON 全部放在一行.我不确定这是 Hive 还是 SerDe 的怪癖,但您需要这样做.

Next you will need to put the JSON all one one line. I'm not sure if this is a quirk of Hive or the SerDe but you need to do it.

然后将数据复制到 HDFS.

Then copy the data into HDFS.

现在您已准备好定义表和查询:

Now you're ready to define a table and query away:

ADD JAR /path/to/jar/json-serde-1.1.2-jar-with-dependencies.jar;
CREATE EXTERNAL TABLE json (
    str struct<
       n1 : int, n2 : int, n3 : int,
       s1 : string,
       st1 : struct < type : string, app : int, value : int, value2 : int>,
       ar1 : array<string>,
       ar2 : array<string>,
       s2 : string
    >
 )
 ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
 LOCATION '/hdfs/path/to/file';

有了这个,您可以运行有趣的嵌套查询,例如:

With this in place you can run interesting nested queries like:

select str.st1.type from json;

最后但并非最不重要的一点,因为这对 Hive 来说是如此特定,所以更新问题和标签是值得的.

Last but not least since this is so specific to Hive it would be worthwhile to update the question and tags.

这篇关于SQL:分解数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆