在Hive表中使用JSON-SerDe [英] using JSON-SerDe in Hive tables

查看：579 发布时间：2018/5/31 19:12:36 hadoop hive

本文介绍了在Hive表中使用JSON-SerDe的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试从以下链接下载JSON-SerDe
http://code.google.com/p/hive-json-serde/wiki/GettingStarted 。

$ b $ pre $

 CREATE TABLE my_table（field1 string，field2 int，
 field3 string，field4 double）
 ROW FORMAT SERDE'org.apache.hadoop.hive.contrib.serde2.JsonSerde';

我添加了Json-SerDe jar作为

 添加JAR /path-to/hive-json-serde.jar;

加载数据为

  LOAD DATA LOCAL INPATH'/home/hduser/pradi/Test.json'INTO TABLE my_table;

并成功加载数据。

但是，当查询数据为

从my_table中选择*

;

从表中取一行作为

data1 100 more data1 123.001

Test.json包含
$ $ p $ {field1：data1，field2：100，field3：more data1， field4：123.001} {field1：data2，field2：200，field3：more data2，field4：123.002} {field1：data3，field2：300，field3：more data3，field4：123.003} {field1：data4，field2 ：400，field3：more data4，field4：123.004}
问题？为什么当我查询表时，只有一行代替4行。并且在 / user / hive / warehouse / my_table 中包含所有4行!!

分群>添加jar /home/hduser/pradeep/hive-json-serde-0.2.jar; 将/home/hduser/pradeep/hive-json-serde-0.2.jar添加到课程路径已添加资源：/home/hduser/pradeep/hive-json-serde-0.2.jar hive> CREATE EXTERNAL TABLE my_table（field1 string，field2 int， > field3 string，field4 double） > ROW FORMAT SERDE'org.apache.hadoop.hive.contrib.serde2.JsonSerde' > WITH SERDEPROPERTIES（ >field1=$。field1， >field2=$。field2， >field3=$。field3 ， >field4=$。field4 >）; OK 所用时间：0.088秒 hive> LOAD DATA LOCAL INPATH'/home/hduser/pradi/test.json'INTO TABLE my_table; 从文件复制数据：/home/hduser/pradi/test.json 复制文件：file：/home/hduser/pradi/test.json 将数据加载到表中default.my_table OK 拍摄时间：0.426秒配置单元> select * from my_table; OK data1 100 more data1 123.001 花费的时间：0.17秒
我已经发布了test.json文件的内容。所以你可以看到查询只产生一行

data1 100 more data1 123.001

我已将json文件更改为employee.json，其中包含

$ {
firstName：Mike，
lastName：Chepesky，
employeeNumber：1840192
} p>

并且也更改了表，但是当我查询表时，它显示空值
分群>添加jar /home/hduser/pradi/hive-json-serde-0.2.jar; 将/home/hduser/pradi/hive-json-serde-0.2.jar添加到课程路径已添加资源：/home/hduser/pradi/hive-json-serde-0.2.jar hive>创建EXTERNAL表employees_json（firstName字符串，lastName字符串，employeeNumber int） > ROW FORMAT SERDE'org.apache.hadoop.hive.contrib.serde2.JsonSerde'; 确定所用时间：0.297秒配置单元>将数据本地inpath'/home/hduser/pradi/employees.json'加载到表employees_json; 从文件复制数据：/home/hduser/pradi/employees.json 复制文件：file：/home/hduser/pradi/employees.json 将数据加载到表default.employees_json OK 所需时间：0.293秒配置单元> select * from employees_json; OK NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 花费时间：0.194秒

解决方案
有点难以分辨没有日志的情况下发生了什么（参见入门）怀疑。只是一个快速的想法 - 你可以尝试，如果它适用于 WITH SERDEPROPERTIES ，那么：

CREATE EXTERNAL TABLE my_table（field1 string，field2 int， field3 string，field4 double） ROW FORMAT SERDE'org.apache.hadoop.hive.contrib.serde2.JsonSerde' 使用SERDEPROPERTIES（ field1=$。field1， field2=$。field2， field3=$。field3， field4=$。field4 ）;
还有一个 fork ，您可能需要试试ThinkBigAnalytics。

$ b
更新：在Test中显示输入。 json无效JSON因此记录被折叠。

查看答案 https：/ /stackoverflow.com/a/11707993/396567 了解更多详情。

I'm trying JSON-SerDe from below link http://code.google.com/p/hive-json-serde/wiki/GettingStarted.
CREATE TABLE my_table (field1 string, field2 int, field3 string, field4 double) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde' ;
I've added Json-SerDe jar as
ADD JAR /path-to/hive-json-serde.jar;
And loaded data as
LOAD DATA LOCAL INPATH '/home/hduser/pradi/Test.json' INTO TABLE my_table;
and it loads data successfully.

But when query data as

Select * from my_table;

I get only one row from table as

data1 100 more data1 123.001

Test.json contains
{"field1":"data1","field2":100,"field3":"more data1","field4":123.001} {"field1":"data2","field2":200,"field3":"more data2","field4":123.002} {"field1":"data3","field2":300,"field3":"more data3","field4":123.003} {"field1":"data4","field2":400,"field3":"more data4","field4":123.004}
Where is the problem? why only one row is coming instead of 4 rows when i query the table. And in /user/hive/warehouse/my_table contains all the 4 rows!!

hive> add jar /home/hduser/pradeep/hive-json-serde-0.2.jar; Added /home/hduser/pradeep/hive-json-serde-0.2.jar to class path Added resource: /home/hduser/pradeep/hive-json-serde-0.2.jar hive> CREATE EXTERNAL TABLE my_table (field1 string, field2 int, > field3 string, field4 double) > ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde' > WITH SERDEPROPERTIES ( > "field1"="$.field1", > "field2"="$.field2", > "field3"="$.field3", > "field4"="$.field4" > ); OK Time taken: 0.088 seconds hive> LOAD DATA LOCAL INPATH '/home/hduser/pradi/test.json' INTO TABLE my_table; Copying data from file:/home/hduser/pradi/test.json Copying file: file:/home/hduser/pradi/test.json Loading data to table default.my_table OK Time taken: 0.426 seconds hive> select * from my_table; OK data1 100 more data1 123.001 Time taken: 0.17 seconds
I've already posted the contents of test.json file. so you can see that query is resulting only one line as
data1 100 more data1 123.001

I've changed the json file to employee.json which contains

{ "firstName" : "Mike", "lastName" : "Chepesky", "employeeNumber" : 1840192 }

and changed table also but it showing a null values when i query the table
hive> add jar /home/hduser/pradi/hive-json-serde-0.2.jar; Added /home/hduser/pradi/hive-json-serde-0.2.jar to class path Added resource: /home/hduser/pradi/hive-json-serde-0.2.jar hive> create EXTERNAL table employees_json (firstName string, lastName string, employeeNumber int ) > ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'; OK Time taken: 0.297 seconds hive> load data local inpath '/home/hduser/pradi/employees.json' into table employees_json; Copying data from file:/home/hduser/pradi/employees.json Copying file: file:/home/hduser/pradi/employees.json Loading data to table default.employees_json OK Time taken: 0.293 seconds hive>select * from employees_json; OK NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL Time taken: 0.194 seconds

解决方案
A bit hard to tell what's going on without the logs (see Getting Started) in case of doubt. Just a quick thought - can you try if it works with WITH SERDEPROPERTIESas so:
CREATE EXTERNAL TABLE my_table (field1 string, field2 int, field3 string, field4 double) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde' WITH SERDEPROPERTIES ( "field1"="$.field1", "field2"="$.field2", "field3"="$.field3", "field4"="$.field4" );
There is also a fork you might want to give a try from ThinkBigAnalytics.

UPDATE: Turns out the input in Test.json is invalid JSON hence the records get collapsed.

See answer https://stackoverflow.com/a/11707993/396567 for further details.

这篇关于在Hive表中使用JSON-SerDe的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在Hive表中使用JSON-SerDe [英] using JSON-SerDe in Hive tables

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

在Hive表中使用JSON-SerDe [英] using JSON-SerDe in Hive tables

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭