在 Hive 表中使用 JSON-SerDe [英] using JSON-SerDe in Hive tables
问题描述
我正在从下面的链接尝试 JSON-SerDehttp://code.google.com/p/hive-json-serde/wiki/入门.
I'm trying JSON-SerDe from below link http://code.google.com/p/hive-json-serde/wiki/GettingStarted.
CREATE TABLE my_table (field1 string, field2 int,
field3 string, field4 double)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde' ;
我已将 Json-SerDe jar 添加为
I've added Json-SerDe jar as
ADD JAR /path-to/hive-json-serde.jar;
并加载数据为
LOAD DATA LOCAL INPATH '/home/hduser/pradi/Test.json' INTO TABLE my_table;
并成功加载数据.
但是当查询数据为
从 my_table 中选择 *;
我从表中只得到一行
data1 100 多个 data1 123.001
Test.json 包含
Test.json contains
{"field1":"data1","field2":100,"field3":"more data1","field4":123.001}
{"field1":"data2","field2":200,"field3":"more data2","field4":123.002}
{"field1":"data3","field2":300,"field3":"more data3","field4":123.003}
{"field1":"data4","field2":400,"field3":"more data4","field4":123.004}
问题出在哪里?为什么当我查询表时只有一行而不是 4 行.而在 /user/hive/warehouse/my_table 中包含所有 4 行!!
Where is the problem? why only one row is coming instead of 4 rows when i query the table. And in /user/hive/warehouse/my_table contains all the 4 rows!!
hive> add jar /home/hduser/pradeep/hive-json-serde-0.2.jar;
Added /home/hduser/pradeep/hive-json-serde-0.2.jar to class path
Added resource: /home/hduser/pradeep/hive-json-serde-0.2.jar
hive> CREATE EXTERNAL TABLE my_table (field1 string, field2 int,
> field3 string, field4 double)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
> WITH SERDEPROPERTIES (
> "field1"="$.field1",
> "field2"="$.field2",
> "field3"="$.field3",
> "field4"="$.field4"
> );
OK
Time taken: 0.088 seconds
hive> LOAD DATA LOCAL INPATH '/home/hduser/pradi/test.json' INTO TABLE my_table;
Copying data from file:/home/hduser/pradi/test.json
Copying file: file:/home/hduser/pradi/test.json
Loading data to table default.my_table
OK
Time taken: 0.426 seconds
hive> select * from my_table;
OK
data1 100 more data1 123.001
Time taken: 0.17 seconds
我已经发布了 test.json 文件的内容.所以你可以看到查询结果只有一行
I've already posted the contents of test.json file. so you can see that query is resulting only one line as
data1 100 more data1 123.001
<小时>
我已将 json 文件更改为 employee.json,其中包含
I've changed the json file to employee.json which contains
{"firstName": "迈克","lastName": "切佩斯基",员工编号":1840192}
{ "firstName" : "Mike", "lastName" : "Chepesky", "employeeNumber" : 1840192 }
并且还更改了表,但是当我查询表时它显示空值
and changed table also but it showing a null values when i query the table
hive> add jar /home/hduser/pradi/hive-json-serde-0.2.jar;
Added /home/hduser/pradi/hive-json-serde-0.2.jar to class path
Added resource: /home/hduser/pradi/hive-json-serde-0.2.jar
hive> create EXTERNAL table employees_json (firstName string, lastName string, employeeNumber int )
> ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde';
OK
Time taken: 0.297 seconds
hive> load data local inpath '/home/hduser/pradi/employees.json' into table employees_json;
Copying data from file:/home/hduser/pradi/employees.json
Copying file: file:/home/hduser/pradi/employees.json
Loading data to table default.employees_json
OK
Time taken: 0.293 seconds
hive>select * from employees_json;
OK
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
Time taken: 0.194 seconds
推荐答案
如果没有日志,很难判断发生了什么(参见 入门) 以防万一.只是一个快速的想法 - 您可以尝试一下它是否可以与 WITH SERDEPROPERTIES
一起使用:
A bit hard to tell what's going on without the logs (see Getting Started) in case of doubt. Just a quick thought - can you try if it works with WITH SERDEPROPERTIES
as so:
CREATE EXTERNAL TABLE my_table (field1 string, field2 int,
field3 string, field4 double)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
WITH SERDEPROPERTIES (
"field1"="$.field1",
"field2"="$.field2",
"field3"="$.field3",
"field4"="$.field4"
);
还有一个 fork 您可能想从 ThinkBigAnalytics 尝试一下.
There is also a fork you might want to give a try from ThinkBigAnalytics.
更新:结果是 Test.json 中的输入是无效的 JSON,因此记录被折叠起来.
UPDATE: Turns out the input in Test.json is invalid JSON hence the records get collapsed.
有关详细信息,请参阅答案 https://stackoverflow.com/a/11707993/396567.
See answer https://stackoverflow.com/a/11707993/396567 for further details.
这篇关于在 Hive 表中使用 JSON-SerDe的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!