使用HDFS更改更新Hive外部表 [英] updating Hive external table with HDFS changes

查看:881
本文介绍了使用HDFS更改更新Hive外部表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可以说,我从文件myFile.csv(位于HDFS中)创建了Hive外部表myTable。



myFile.csv每天都会更改,然后我有兴趣每天更新一次myTable。

是否有任何HiveQL查询告诉每天更新表格?



谢谢。



PS



我想知道它是否与目录一样:可以说,我创建当myDir包含10个文件时,HDFS目录myDir中的Hive分区。第二天myDIr包含20个文件(添加了10个文件)。我应该更新Hive分区吗?

解决方案

Hive基本上有两种类型的表。



一个是由蜂房仓库管理的托管表,每当您创建表时,数据将被复制到内部仓库。
在查询输出中不能包含最新数据



其他是外部表不会将其数据复制到内部仓库



因此,只要您在表上触发查询,就会从文件。



因此,您甚至可以在查询输出中获得最新数据。

这是外部表的目标之一。



你甚至可以删除表,数据是不会丢失。


lets say, I created Hive external table "myTable" from file myFile.csv ( located in HDFS ).

myFile.csv is changed every day, then I'm interested to update "myTable" once a day too.

Is there any HiveQL query that tells to update the table every day?

Thank you.

P.S.

I would like to know if it works the same way with directories: lets say, I create Hive partition from HDFS directory "myDir", when "myDir" contains 10 files. next day "myDIr" contains 20 files (10 files were added). Should I update Hive partition?

解决方案

There are two types of tables in Hive basically.

One is Managed table managed by hive warehouse whenever you create a table data will be copied to internal warehouse. You can not have latest data in the query output.

Other is external table in which hive will not copy its data to internal warehouse.

So whenever you fire query on table then it retrieves data from the file.

SO you can even have the latest data in the query output.

That is one of the goals of external table.

You can even drop the table and the data is not lost.

这篇关于使用HDFS更改更新Hive外部表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆