Hive创建空表,即使有很多文件 [英] Hive creates empty table, even there're plenty of file

查看:286
本文介绍了Hive创建空表,即使有很多文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将一些文件放入包含如下数据的hdfs( / path / to / directory / )中;

  63 EB44863EA74AA0C5D3ECF3D678A7DF59 
62 FABBC9ED9719A5030B2F6A4591EDB180
59 6BF6D40AF15DE2D7E295EAFB9574BBF8

全部命名为 _user_hive_warehouse_file_name_000XYZ_A 。这些文件已从其他hdfs下载。



我试图通过Hive创建外部表;



<$ p $ CREATE EXTERNAL TABLE users(
id int,
user string

ROW FORMAT DELIMITED FIELDS TERMINATED BY'\ t'
LOCATION'/ path / to / directory /';

据说;

  OK 
所用时间:0.098秒

select * from users; 返回

select count(1)from用户; 返回 0



Hive成功创建表,但始终为空。如果我把其他文件如another.txt包含上面提到的示例数据, select count(1)from users; 返回 3



我缺少什么,为什么表格是空的?



环境




  • JDK 7

  • Hadoop 2.6.0

  • Hive 0.14.0
  • Ubuntu 14.04


解决方案

I认为您遇到了在 HIVE-6431 中进行外围讨论的问题。特别是,这个注释很重要:


默认情况下,hadoop中的FileInputFormat(它是各种格式的超类)会忽略文件名称以_或。开头,很难在配置单元中使用。


解决方法很可能避免使用以 _ 开头的文件名。


I put some files into hdfs (/path/to/directory/) which contain data like following;

63  EB44863EA74AA0C5D3ECF3D678A7DF59
62  FABBC9ED9719A5030B2F6A4591EDB180
59  6BF6D40AF15DE2D7E295EAFB9574BBF8

All of them named as _user_hive_warehouse_file_name_000XYZ_A. These files had downloaded from another hdfs.

I'm trying to create external table via Hive;

CREATE EXTERNAL TABLE users(
id int,
user string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/path/to/directory/';

It says;

OK
Time taken: 0.098 seconds

select * from users; returns empty.

select count(1) from users; returns 0.

Hive creates the table successfully, but it's always empty. If I put another file like another.txt, that contains the sample data mentioned above, select count(1) from users; returns 3.

What am I missing, why the table is empty?

Environment:

  • JDK 7
  • Hadoop 2.6.0
  • Hive 0.14.0
  • Ubuntu 14.04

解决方案

I think you are encountering an issue that is peripherally discussed in HIVE-6431. In particular, this comment is the important one:

By default, FileInputFormat(which is the super class of various formats) in hadoop ignores file name starts with "_" or ".", and hard to walk around this in hive codebase.

The workaround is probably to avoid using filenames that begin with _ or .

这篇关于Hive创建空表,即使有很多文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆