蜂巢:如何爆炸嵌入在CSV文件中的JSON列? [英] Hive : How to explode a JSON column embedded in a CSV file?

查看:135
本文介绍了蜂巢:如何爆炸嵌入在CSV文件中的JSON列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从CSV文件(带有标头和管道定界符)中,我得到了以下两个包含JSON列(内部带有集合)的内容,如下所示:

From a CSV file (with a header and a pipe delimiter) I've got the two following contents which contain a JSON column (with a collection inside), like this:

第一种情况(带有不带名称的JSON集合):

First case (with a JSON collection with no name):

ProductId|IngestTime|ProductOrders
9180|20171025145034|[{"OrderId":"299","Location":"NY"},{"OrderId":"499","Location":"LA"}]
8251|20171026114034|[{"OrderId":"1799","Location":"London"}]

第二种情况(带有名为"Orders"的JSON集合):

Second case (with a JSON collection named "Orders"):

ProductId|IngestTime|ProductOrders
9180|20171025145034|{"Orders":[{"OrderId":"299","Location":"NY"},{"OrderId":"499","Location":"LA"}]}
8251|20171026114034|{"Orders":[{"OrderId":"1799","Location":"London"}]}

首先,我将创建原始"表,如下所示:

Firstable, I create my "raw" table like this:

DROP TABLE IF EXISTS Product;
CREATE EXTERNAL TABLE Product (
  ProductId STRING,
  IngestTime STRING,
  ProductOrders STRING
)
COMMENT "Product raw table"
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\|'
STORED AS TEXTFILE
LOCATION
  '/data/product'
TBLPROPERTIES ("skip.header.line.count"="1");

当我用以下方法查询表时:

When I query my table with:

SELECT * FROM Product

我有以下答案:

第一种情况(带有不带名称的JSON集合):

First case (with a JSON collection with no name):

ProductId  IngestTime      ProductOrders
9180       20171025145034  [{"OrderId":"299","Location":"NY"},{"OrderId":"499","Location":"LA"}]
8251       20171026114034  [{"OrderId":"1799","Location":"London"}]

第二种情况(带有名为"Orders"的JSON集合):

Second case (with a JSON collection named "Orders"):

ProductId  IngestTime      ProductOrders
9180       20171025145034  {"Orders":[{"OrderId":"299","Location":"NY"},{"OrderId":"499","Location":"LA"}]}
8251       20171026114034  {"Orders":[{"OrderId":"1799","Location":"London"}]}

好的,很好,到目前为止效果很好!

Ok really nice, so far it works well !

但是我现在需要的是创建一个返回以下内容的SELECT查询:

But what I need now, is to create a SELECT query which returns:

ProductId  IngestTime      ProductOrderId ProductLocation
9180       20171025145034  299            NY
9180       20171025145034  499            LA
8251       20171026114034  1799           London

我确实需要一个可移植的SQL查询,该查询对我的两种情况(带有或不带有标签"OrderId")都适用.

I really need a portable SQL query which works well for my two cases (with or without the tag "OrderId").

到目前为止,我通过使用'explode','get_json_object'等尝试了许多组合,但是我仍然没有找到正确的SQL查询.

So far, I tried many combinations by using 'explode', 'get_json_object' and so on, but I still haven't found the right SQL query.

非常感谢您的帮助:-)

Thanks a lot for your help :-)

推荐答案

您可以尝试

CREATE EXTERNAL TABLE product(productid String,ingesttime String, productorders array<struct<orderid:String,location:string>> ) 

select productid,ingesttime, productorders.orderid[0] as orderid , productorders.location[0] as location from product

这篇关于蜂巢:如何爆炸嵌入在CSV文件中的JSON列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆