加载LinkedIn JSON响应到HIVE [英] Loading Linkedin JSON response into HIVE

查看:272
本文介绍了加载LinkedIn JSON响应到HIVE的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经尝试了多种方法来创建HIVE表和检索利用JSONSerDe数据。但这里是我遇到的问题的:

I have tried multiple ways to create the HIVE table and retrieve data using JSONSerDe. But here are the errors I encounter:

hive> select * from jobs;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: j 
ava.io.EOFException: No content to map to Object due to end of input

hive> select values from jobs;

Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing writable
    at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:159)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java :1408)

下面是创建表的语句:

create external table jobs (
 jobs STRUCT<
   values : ARRAY<STRUCT<
   id : STRING,
   customerJobCode : STRING,
   postingDate : STRING,
   expirationDate : STRING,
 company : STRUCT<
   id : STRING,
   name : STRING>,
 position : STRUCT<
   title : STRING,
   jobFunctions : STRING,
   industries : STRING,
   jobType : STRING,
   experienceLevel : STRING>,
 skillsAndExperience : ARRAY<STRING>,
 descriptionSnippet : ARRAY<STRING>,
 salary : STRING,
 jobPoster : STRUCT<
  id : STRING,
  firstName : STRING,
  lastName : STRING,
  headline : STRING>,
 referralBonus : STRING,
 locationDescription : STRING>>>
 )
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION '/user/sunita/tables/jobs';

原始输入文件 - <一个href=\"https://gist.github.com/anonymous/e2c15d808bbe46b707bf/raw/88d775cb418901807980c52e803ffc8be53adc5f/jobsearch.json\" rel=\"nofollow\">https://gist.github.com/anonymous/e2c15d808bbe46b707bf/raw/88d775cb418901807980c52e803ffc8be53adc5f/jobsearch.json

我试过不加入价值(结构数组)表中说明
也试过无输入文件以及表创建语句中的价值。有没有错误,这种方法,但作为一个可以预料的,只有1项进入表和其他一切去为空。蜂巢认为,作为导致此问题的单个记录。

I tried not adding 'values' (an array of structure) to the table description Also tried without the 'values' in input file as well as table creation statement. There are no errors with this approach but as one can anticipate, only 1 entry gets into the table and everything else goes as null. Hive considers it as a single record which causes this issue.

我试图简化了输入选择较小的领域,但仍然得到检索信息相同的错误。在这方面的任何帮助是真正的AP preciated。

I tried simplifying the input to select lesser fields, but still get the same error on retrieving the information. Any help in this regard is truly appreciated.

也保证了JSON字符串是有效的使用记事本+ + JSON插件。
任何帮助是真正的AP preaciated。

Also ensured that the JSON string is valid using the Notepad ++ JSON plugin. Any help is truly appreaciated.

推荐答案

的问题是在输入文件的末尾换行。确保我elimiated在数据的末尾任何字符解决了问题。

The problem was a newline at the end of the input file. Making sure that I elimiated any characters at the end of the data resolved the issue.

这篇关于加载LinkedIn JSON响应到HIVE的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆