Simple Hive查询是空的 [英] Simple Hive query is empty

查看:102
本文介绍了Simple Hive查询是空的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个csv日志文件。使用以下语句将它加载到Hive中:

  CREATE EXTERNAL TABLE iprange(id STRING,ip STRING)行格式界限字段终止'\,'存为文本文件位置'/ user / hadoop / expandediprange /'; 

我想要执行一个简单的查询,如:

  select * from iprange where ip =0.0.0.2; 

但我得到一个空的结果。

我在HDFS上运行Hive,我应该使用HBase吗?
我的结论是,这与桌子大小有关。日志文件为160 MB,Hive中生成的表有800万行。如果我尝试创建一个较小的文件并将其加载到Hive,它将起作用。



有什么不对吗?

编辑:我忘记说它使用一个小实例在Amazon Elastic MapReduce上运行。 我找到了问题。这不是真正的Hive问题。我使用Hadoop作业的输出作为输入,并且在该作业中,我将输出写入键中,并将该值保留为空字符串:

  context.write(new Text(id +,+ ip),new Text()); 

问题在于,Hadoop默认在键和值之间插入了制表符,并且作为字段它也是一个字符串,因此我在每行中都有一个尾随选项卡。我发现它使用Pig,因为它使用()包含输出。



对于我来说,解决方案是将分隔符设置为另一个字符,因为我只有两个字段一个在键中,另一个在值中,并将分隔符设置为,:

  conf.set( mapred.textoutputformat.separator,,); 

也许可以在Hive中修改这些内容。


I have a csv log file. After loading it into Hive using this sentence:

CREATE EXTERNAL TABLE iprange(id STRING, ip STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\,' STORED AS TEXTFILE LOCATION '/user/hadoop/expandediprange/';

I want to perfom a simple query like:

select * from iprange where ip="0.0.0.2";

But I get an empty result.

I'm running Hive on HDFS, should I use HBase? My conclusion is that it's got something to do with the table size. Log file is 160 MB, and the generated table in Hive has 8 million rows. If I try to create myself a smaller file and load it to Hive it will work.

Any idea of what is wrong?

Edit: I forgot to say that it's running on Amazon Elastic MapReduce using a small instance.

解决方案

I found the problem. It was not a Hive issue really. I'm using the output of a Hadoop job as input, and in that job I was writing the output in the key, leaving the value as an empty string:

context.write(new Text(id + "," + ip), new Text(""));

The problem is that Hadoop inserts a tab character by default between the key and the value, and as field is a string it took the tab as well, so I had a trailing tab in every line. I discovered it using Pig as it embraces the output with ().

The solution for me is to set the separator to another character, as I have only two fields I write one in the key and the other one in the value, and set the separator to ",":

conf.set("mapred.textoutputformat.separator", ",");

Maybe its possible to trim these things in Hive.

这篇关于Simple Hive查询是空的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆