在Hive表中使用JSON-SerDe [英] using JSON-SerDe in Hive tables

查看:579
本文介绍了在Hive表中使用JSON-SerDe的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从以下链接下载JSON-SerDe
http://code.google.com/p/hive-json-serde/wiki/GettingStarted


$ b $ pre $ CREATE TABLE my_table(field1 string,field2 int,
field3 string,field4 double)
ROW FORMAT SERDE'org.apache.hadoop.hive.contrib.serde2.JsonSerde';

我添加了Json-SerDe jar作为

 添加JAR /path-to/hive-json-serde.jar; 

加载数据为

  LOAD DATA LOCAL INPATH'/home/hduser/pradi/Test.json'INTO TABLE my_table; 

并成功加载数据。

但是,当查询数据为

从my_table中选择*

;

从表中取一行作为

data1 100 more data1 123.001



Test.json包含
$ $ p $ {field1:data1,field2:100,field3:more data1, field4:123.001}

{field1:data2,field2:200,field3:more data2,field4:123.002}

{field1:data3,field2:300,field3:more data3,field4:123.003}

{field1:data4,field2 :400,field3:more data4,field4:123.004}

问题?为什么当我查询表时,只有一行代替4行。并且在 / user / hive / warehouse / my_table 中包含所有4行!!






 分群>添加jar /home/hduser/pradeep/hive-json-serde-0.2.jar; 
将/home/hduser/pradeep/hive-json-serde-0.2.jar添加到课程路径
已添加资源:/home/hduser/pradeep/hive-json-serde-0.2.jar

hive> CREATE EXTERNAL TABLE my_table(field1 string,field2 int,
> field3 string,field4 double)
> ROW FORMAT SERDE'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
> WITH SERDEPROPERTIES(
>field1=$。field1,
>field2=$。field2,
>field3=$。field3 ,
>field4=$。field4
>);
OK
所用时间:0.088秒

hive> LOAD DATA LOCAL INPATH'/home/hduser/pradi/test.json'INTO TABLE my_table;
从文件复制数据:/home/hduser/pradi/test.json
复制文件:file:/home/hduser/pradi/test.json
将数据加载到表中default.my_table
OK
拍摄时间:0.426秒

配置单元> select * from my_table;
OK
data1 100 more data1 123.001
花费的时间:0.17秒

我已经发布了test.json文件的内容。所以你可以看到查询只产生一行

  data1 100 more data1 123.001 






我已将json文件更改为employee.json,其中包含

$ {
firstName:Mike,
lastName:Chepesky,
employeeNumber:1840192
} p>

并且也更改了表,但是当我查询表时,它显示空值

 分群>添加jar /home/hduser/pradi/hive-json-serde-0.2.jar; 
将/home/hduser/pradi/hive-json-serde-0.2.jar添加到课程路径
已添加资源:/home/hduser/pradi/hive-json-serde-0.2.jar

hive>创建EXTERNAL表employees_json(firstName字符串,lastName字符串,employeeNumber int)
> ROW FORMAT SERDE'org.apache.hadoop.hive.contrib.serde2.JsonSerde';
确定
所用时间:0.297秒


配置单元>将数据本地inpath'/home/hduser/pradi/employees.json'加载到表employees_json;
从文件复制数据:/home/hduser/pradi/employees.json
复制文件:file:/home/hduser/pradi/employees.json
将数据加载到表default.employees_json
OK
所需时间:0.293秒


配置单元> select * from employees_json;
OK
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
花费时间:0.194秒


解决方案

有点难以分辨没有日志的情况下发生了什么(参见入门)怀疑。只是一个快速的想法 - 你可以尝试,如果它适用于 WITH SERDEPROPERTIES ,那么:

  CREATE EXTERNAL TABLE my_table(field1 string,field2 int,
field3 string,field4 double)
ROW FORMAT SERDE'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
使用SERDEPROPERTIES(
field1=$。field1,
field2=$。field2,
field3=$。field3,
field4=$。field4
);

还有一个 fork ,您可能需要试试ThinkBigAnalytics。


$ b

更新:在Test中显示输入。 json无效JSON因此记录被折叠。



查看答案 https:/ /stackoverflow.com/a/11707993/396567 了解更多详情。


I'm trying JSON-SerDe from below link http://code.google.com/p/hive-json-serde/wiki/GettingStarted.

         CREATE TABLE my_table (field1 string, field2 int, 
                                     field3 string, field4 double)
         ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde' ;

I've added Json-SerDe jar as

          ADD JAR /path-to/hive-json-serde.jar;

And loaded data as

LOAD DATA LOCAL INPATH  '/home/hduser/pradi/Test.json' INTO TABLE my_table;

and it loads data successfully.

But when query data as

Select * from my_table;

I get only one row from table as

data1 100 more data1 123.001

Test.json contains

{"field1":"data1","field2":100,"field3":"more data1","field4":123.001} 

{"field1":"data2","field2":200,"field3":"more data2","field4":123.002} 

{"field1":"data3","field2":300,"field3":"more data3","field4":123.003} 

{"field1":"data4","field2":400,"field3":"more data4","field4":123.004}

Where is the problem? why only one row is coming instead of 4 rows when i query the table. And in /user/hive/warehouse/my_table contains all the 4 rows!!


hive> add jar /home/hduser/pradeep/hive-json-serde-0.2.jar;
Added /home/hduser/pradeep/hive-json-serde-0.2.jar to class path
Added resource: /home/hduser/pradeep/hive-json-serde-0.2.jar

hive> CREATE EXTERNAL TABLE my_table (field1 string, field2 int,
>                                 field3 string, field4 double)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
> WITH SERDEPROPERTIES (
>   "field1"="$.field1",
>   "field2"="$.field2",
>   "field3"="$.field3",
>   "field4"="$.field4"
> );
OK
Time taken: 0.088 seconds

hive> LOAD DATA LOCAL INPATH  '/home/hduser/pradi/test.json' INTO TABLE my_table;
Copying data from file:/home/hduser/pradi/test.json
Copying file: file:/home/hduser/pradi/test.json
Loading data to table default.my_table
OK
Time taken: 0.426 seconds

hive> select * from my_table;
OK
data1   100     more data1      123.001
Time taken: 0.17 seconds

I've already posted the contents of test.json file. so you can see that query is resulting only one line as

data1   100     more data1      123.001


I've changed the json file to employee.json which contains

{ "firstName" : "Mike", "lastName" : "Chepesky", "employeeNumber" : 1840192 }

and changed table also but it showing a null values when i query the table

hive> add jar /home/hduser/pradi/hive-json-serde-0.2.jar;
Added /home/hduser/pradi/hive-json-serde-0.2.jar to class path
Added resource: /home/hduser/pradi/hive-json-serde-0.2.jar

hive> create EXTERNAL table employees_json (firstName string, lastName string,        employeeNumber int )
> ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde';
OK
Time taken: 0.297 seconds


hive> load data local inpath '/home/hduser/pradi/employees.json' into table     employees_json;
Copying data from file:/home/hduser/pradi/employees.json
Copying file: file:/home/hduser/pradi/employees.json
Loading data to table default.employees_json
OK
Time taken: 0.293 seconds


 hive>select * from employees_json;
  OK
  NULL    NULL    NULL
  NULL    NULL    NULL
  NULL    NULL    NULL
  NULL    NULL    NULL
  NULL    NULL    NULL
  NULL    NULL    NULL
Time taken: 0.194 seconds

解决方案

A bit hard to tell what's going on without the logs (see Getting Started) in case of doubt. Just a quick thought - can you try if it works with WITH SERDEPROPERTIESas so:

CREATE EXTERNAL TABLE my_table (field1 string, field2 int, 
                                field3 string, field4 double)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
WITH SERDEPROPERTIES (
  "field1"="$.field1",
  "field2"="$.field2",
  "field3"="$.field3",
  "field4"="$.field4" 
);

There is also a fork you might want to give a try from ThinkBigAnalytics.

UPDATE: Turns out the input in Test.json is invalid JSON hence the records get collapsed.

See answer https://stackoverflow.com/a/11707993/396567 for further details.

这篇关于在Hive表中使用JSON-SerDe的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆