在 Hive 表中使用 JSON-SerDe [英] using JSON-SerDe in Hive tables

查看:29
本文介绍了在 Hive 表中使用 JSON-SerDe的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从下面的链接尝试 JSON-SerDehttp://code.google.com/p/hive-json-serde/wiki/入门.

I'm trying JSON-SerDe from below link http://code.google.com/p/hive-json-serde/wiki/GettingStarted.

         CREATE TABLE my_table (field1 string, field2 int, 
                                     field3 string, field4 double)
         ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde' ;

我已将 Json-SerDe jar 添加为

I've added Json-SerDe jar as

          ADD JAR /path-to/hive-json-serde.jar;

并加载数据为

LOAD DATA LOCAL INPATH  '/home/hduser/pradi/Test.json' INTO TABLE my_table;

并成功加载数据.

但是当查询数据为

从 my_table 中选择 *;

我从表中只得到一行

data1 100 多个 data1 123.001

Test.json 包含

Test.json contains

{"field1":"data1","field2":100,"field3":"more data1","field4":123.001} 

{"field1":"data2","field2":200,"field3":"more data2","field4":123.002} 

{"field1":"data3","field2":300,"field3":"more data3","field4":123.003} 

{"field1":"data4","field2":400,"field3":"more data4","field4":123.004}

问题出在哪里?为什么当我查询表时只有一行而不是 4 行.而在 /user/hive/warehouse/my_table 中包含所有 4 行!!

Where is the problem? why only one row is coming instead of 4 rows when i query the table. And in /user/hive/warehouse/my_table contains all the 4 rows!!

hive> add jar /home/hduser/pradeep/hive-json-serde-0.2.jar;
Added /home/hduser/pradeep/hive-json-serde-0.2.jar to class path
Added resource: /home/hduser/pradeep/hive-json-serde-0.2.jar

hive> CREATE EXTERNAL TABLE my_table (field1 string, field2 int,
>                                 field3 string, field4 double)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
> WITH SERDEPROPERTIES (
>   "field1"="$.field1",
>   "field2"="$.field2",
>   "field3"="$.field3",
>   "field4"="$.field4"
> );
OK
Time taken: 0.088 seconds

hive> LOAD DATA LOCAL INPATH  '/home/hduser/pradi/test.json' INTO TABLE my_table;
Copying data from file:/home/hduser/pradi/test.json
Copying file: file:/home/hduser/pradi/test.json
Loading data to table default.my_table
OK
Time taken: 0.426 seconds

hive> select * from my_table;
OK
data1   100     more data1      123.001
Time taken: 0.17 seconds

我已经发布了 test.json 文件的内容.所以你可以看到查询结果只有一行

I've already posted the contents of test.json file. so you can see that query is resulting only one line as

data1   100     more data1      123.001

<小时>

我已将 json 文件更改为 employee.json,其中包含


I've changed the json file to employee.json which contains

{"firstName": "迈克","lastName": "切佩斯基",员工编号":1840192}

{ "firstName" : "Mike", "lastName" : "Chepesky", "employeeNumber" : 1840192 }

并且还更改了表,但是当我查询表时它显示空值

and changed table also but it showing a null values when i query the table

hive> add jar /home/hduser/pradi/hive-json-serde-0.2.jar;
Added /home/hduser/pradi/hive-json-serde-0.2.jar to class path
Added resource: /home/hduser/pradi/hive-json-serde-0.2.jar

hive> create EXTERNAL table employees_json (firstName string, lastName string,        employeeNumber int )
> ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde';
OK
Time taken: 0.297 seconds


hive> load data local inpath '/home/hduser/pradi/employees.json' into table     employees_json;
Copying data from file:/home/hduser/pradi/employees.json
Copying file: file:/home/hduser/pradi/employees.json
Loading data to table default.employees_json
OK
Time taken: 0.293 seconds


 hive>select * from employees_json;
  OK
  NULL    NULL    NULL
  NULL    NULL    NULL
  NULL    NULL    NULL
  NULL    NULL    NULL
  NULL    NULL    NULL
  NULL    NULL    NULL
Time taken: 0.194 seconds

推荐答案

如果没有日志,很难判断发生了什么(参见 入门) 以防万一.只是一个快速的想法 - 您可以尝试一下它是否可以与 WITH SERDEPROPERTIES 一起使用:

A bit hard to tell what's going on without the logs (see Getting Started) in case of doubt. Just a quick thought - can you try if it works with WITH SERDEPROPERTIESas so:

CREATE EXTERNAL TABLE my_table (field1 string, field2 int, 
                                field3 string, field4 double)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
WITH SERDEPROPERTIES (
  "field1"="$.field1",
  "field2"="$.field2",
  "field3"="$.field3",
  "field4"="$.field4" 
);

还有一个 fork 您可能想从 ThinkBigAnalytics 尝试一下.

There is also a fork you might want to give a try from ThinkBigAnalytics.

更新:结果是 Test.json 中的输入是无效的 JSON,因此记录被折叠起来.

UPDATE: Turns out the input in Test.json is invalid JSON hence the records get collapsed.

有关详细信息,请参阅答案 https://stackoverflow.com/a/11707993/396567.

See answer https://stackoverflow.com/a/11707993/396567 for further details.

这篇关于在 Hive 表中使用 JSON-SerDe的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆