如何使用Json serde解析Hive表的Json列? [英] How can I parse a Json column of a Hive table using a Json serde?

查看:659
本文介绍了如何使用Json serde解析Hive表的Json列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图根据事件的名称将反序列化的json事件加载到不同的表中。

现在我将所有事件都放在同一个事件中表中,表只有两列EventName和Payload(有效负载存储事件的json表示):

  CREATE TABLE事件(EventName STRING,Payload STRING)

所以基本上我想要的是加载下表中的数据:
$ b $ pre $ CREATE TABLE TempEvent(Column1 STRING,Column2 STRING,Column3 STRING)
ROW FORMAT SERDE'org.openx。 data.jsonserde.JsonSerDe'
STORED AS TEXTFILE;

然后加载这些事件:

  INSERT INTO TempEvent从事件中选择Payload其中EventName ='TempEvent'; 

但是配置单元抛出异常,表示目标表有3列,select语句只有1 。



有没有其他方法可以实现这一点,或者我做错了什么?

解决方案

JSON serde需要每行有一个JSON的表才能使用它。因此,它不会与您的输入表一起工作,因为行

  TempEvent,{Column1:value1,Column2 :value2,Column3:value3} 

不是有效的JSON。所以首先你需要把它移动到一个只包含有效的JSON的新的中间表中,然后使用加载数据填充JSON serde表:

  create table event_json(Payload string)
存储为文本文件;

插入表中event_json
从事件
中选择Payload,其中EventName ='TempEvent';

创建表格TempEvent(Column1字符串,Column2字符串,Column3字符串)
ROW FORMAT SERDE'org.openx.data.jsonserde.JsonSerDe';

加载数据inpath'/ user / hive / warehouse / event_json'覆盖到表格TempEvent;

然后您可以像这样提取列:

 从TempEvent中选择Column1,Column2,Column3 
;

当然,如果您的源表最初是有效的JSON,则所有这些处理都不是必需的,您可以只需创建TempEvent表作为外部表,并直接从中获取数据。


I am trying to load de-serialized json events into different tables, based on the name of the event.

Right now I have all the events in the same table, the table has only two columns EventName and Payload (the payload stores the json representation of the event):

CREATE TABLE event( EventName STRING, Payload STRING)

So basically what i want is to load the data in the following table:

CREATE TABLE TempEvent ( Column1 STRING, Column2 STRING, Column3 STRING )
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS TEXTFILE;

And load the events with something like:

INSERT INTO TempEvent select Payload from event where EventName='TempEvent';

But hive is throwing an exception saying that the destination table has 3 columns, and the select statement just 1.

Is there other way to acomplish this or i am doing something wrong?.

解决方案

The JSON serde requires a table with one JSON per line in order to use it. So it won't work with your input table because the line

TempEvent, {"Column1":"value1","Column2":"value2","Column3":"value3"}

is not a valid JSON. So first you need to move it into a new intermediate table which just contains valid JSON, and then populate the JSON serde table from there using load data:

create table event_json (Payload string)
stored as textfile;

insert into table event_json 
select Payload from event
where EventName='TempEvent';

create table TempEvent (Column1 string, Column2 string, Column3 string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';

load data inpath '/user/hive/warehouse/event_json' overwrite into table TempEvent;

Then you can extract the columns like this:

select Column1, Column2, Column3
from TempEvent;

Of course all of this processing would not be necessary if your source table was valid JSON originally, you could just create the TempEvent table as as an external table and pull data directly from it.

这篇关于如何使用Json serde解析Hive表的Json列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆