从Cosmos中的JSON文件选择列时发生MapReduce错误 [英] MapReduce error when selecting column from JSON file in Cosmos

查看:170
本文介绍了从Cosmos中的JSON文件选择列时发生MapReduce错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题如下:

在使用Cygnus 0.2.1创建表后,当尝试从Hive中选择列时,我收到一个MapReduce错误。如果我们看到由Cygnus在hadoop中创建的文件,我们可以看到使用的格式是JSON。这个问题没有出现在以前的版本的Cygnus,因为它是以CSV格式创建hadoop文件。

After having created a table with Cygnus 0.2.1, I receive a MapReduce error when trying to select a column from Hive. If we see the files created in hadoop by Cygnus, we can see that the format used is JSON. This problem didn't appear in previous versions of Cygnus as it was creating hadoop files in CSV format.

为了测试,我留下了2个表创建从每个格式。您可以比较并查看以下查询的错误:

In order to test it, I left 2 tables created reading from each format. You can compare and see the error with the following queries:

SELECT entitytype FROM fiware_ports_meteo; (it fails, created with 0.2.1 in JSON format)
SELECT entitytype FROM fiware_test_table; (it works, created with 0.2 in CSV format)

HDFS文件的路径分别是:

The path to the HDFS files are, respectively:

/user/fiware/ports/meteo
/user/fiware/testTable/

我怀疑这个错误来自MapReduce作业解析JSON文件,因为CSV格式正常工作。

I suspect the error comes from parsing the JSON file by the MapReduce job since the CSV format works as expected.

如何避免此问题?

推荐答案

您只需添加 Json serde 到Hive类路径。作为非特权用户,您可以从Hive CLI执行此操作:

You simply have to add the Json serde to the Hive classpath. As a not priviledged user, you can do that from the Hive CLI:

hive> ADD JAR /usr/local/hive-0.9.0-shark-0.8.0-bin/lib/json-serde-1.1.9.3-SNAPSHOT.jar;

如果您开发了远程Hive客户端,则可以执行与任何其他查询执行相同的操作。假设您使用的是Java:

If you have developed a remote Hive client, you can perform the same operation as any other query execution. Let's say you are using Java:

Statement stmt = con.createStatement();
stmt.executeQuery("ADD JAR /usr/local/hive-0.9.0-shark-0.8.0-bin/lib/json-serde-1.1.9.3-SNAPSHOT.jar");
stmt.close();

这篇关于从Cosmos中的JSON文件选择列时发生MapReduce错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆