与 hive 中的外部表混淆 [英] Confusion with the external tables in hive

查看:31
本文介绍了与 hive 中的外部表混淆的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用以下命令创建了 hive 外部表:

I have created the hive external table using below command:

 use hive2;

create external table depTable (depId int comment 'This is the unique id for each dep', depName string,location string) comment 'department table' row format delimited fields terminated by "," 
stored as textfile location '/dataDir/';  

现在,当我查看 HDFS 时,我可以看到数据库,但仓库内没有 depTable.

Now, when I view the HDFS I can see the db but there is no depTable inside the warehouse.

[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse/hive2.db
[cloudera@quickstart ~]$ 

上面可以看到在这个数据库中没有创建表.据我所知,外部表没有存储在 hive 仓库中.所以我说得对吗??如果是,那么它存储在哪里??
但是如果我先创建外部表然后加载数据,那么我就可以看到 hive2.db 里面的文件.

Above you can see that there is no table created in this DB. As far as I know, external tables are not stored in the hive warehouse.So am I correct ?? If yes then where is it stored ??
But if I create external table first and then load the data then I am able to see the file inside hive2.db.

hive> create external table depTable (depId int comment 'This is the unique id for each dep', depName string,location string) comment 'department table' row format delimited fields terminated by "," stored as textfile;
OK
Time taken: 0.056 seconds
hive> load data inpath '/dataDir/department_data.txt' into table depTable;
Loading data to table default.deptable
Table default.deptable stats: [numFiles=1, totalSize=90]
OK
Time taken: 0.28 seconds
hive> select * from deptable;
OK
1001    FINANCE SYDNEY
2001    AUDIT   MELBOURNE
3001    MARKETING   PERTH
4001    PRODUCTION  BRISBANE

现在,如果我触发 hadoop fs 查询,我可以在数据库下看到这个表,如下所示:

Now, if I fire the hadoop fs query I can see this table under database as below:

[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse/hive2.db
Found 1 items
drwxrwxrwx   - cloudera supergroup          0 2019-01-17 09:07 /user/hive/warehouse/hive2.db/deptable  

如果我删除了表,我仍然可以在 HDFS 中看到如下表:

If I delete the table still I am able to see table in the HDFS as below:

[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse/hive2.db
Found 1 items
drwxrwxrwx   - cloudera supergroup          0 2019-01-17 09:11 /user/hive/warehouse/hive2.db/deptable  

那么,外部表的确切行为是什么??当我使用 LOCATION 关键字创建它时,它在哪里存储以及当我使用 load 语句创建时,为什么它会存储在 HDFS 中并删除后为什么它不会被删除.

So, what is the exact behavior of the external tables ?? When I create using LOCATION keyword where does it get stored and when I create using load statement why it is getting stored in the HDFS and after deleting why it doesn't get deleted.

推荐答案

EXTERNALMANAGED 表之间的主要区别在于删除表/分区行为.当您删除 MANAGED 表/分区时,包含数据文件的位置也被删除.当您删除 EXTERNAL 表时,包含数据文件的位置保持原样.

The main difference between EXTERNAL and MANAGED tables is in Drop table/partition behavior. When you drop MANAGED table/partition, the location with data files also removed. When you drop EXTERNAL table, the location with data files remains as is.

更新:TBLPROPERTIES(external.table.purge"=true") 4.0.0+ 版(HIVE-19981) 在外部表上设置时也会删除数据.

UPDATE: TBLPROPERTIES ("external.table.purge"="true") in release 4.0.0+ (HIVE-19981) when set on external table would delete the data as well.

EXTERNAL 表以及 MANAGED 存储在 DDL 中指定的位置.您可以在现有位置的顶部创建表,数据文件已经在该位置,它适用于 EXTERNALMANAGED,无关紧要.

EXTERNAL table as well as MANAGED is being stored in the location specified in DDL. You can create table on top of existing location with data files already in the location and it will work for both EXTERNAL or MANAGED, does not matter.

您甚至可以在同一位置的顶部同时创建 EXTERNALMANAGED 表,请参阅此答案的更多详细信息和测试:https://stackoverflow.com/a/54038932/2700344

You even can create both EXTERNAL and MANAGED tables on top of the same location, see this answer with more details and tests: https://stackoverflow.com/a/54038932/2700344

如果您指定了位置,则两种类型的表的数据都将存储在该位置.如果您未指定位置,则数据将位于默认位置:/user/hive/warehouse/database_name.db/table_name 对于托管表和外部表.

If you specified location, the data will be stored in that location for both types of tables. If you did not specify location, the data will be in default location: /user/hive/warehouse/database_name.db/table_name for both managed and external tables.

更新:根据平台/供应商的不同,位置也可能有一些限制,请参阅 https://stackoverflow.com/a/67073849/2700344,您可能不被允许在默认允许的根位置之外创建托管/外部表.

Update: Also there can be some restrictions on location depending on platform/vendor, see https://stackoverflow.com/a/67073849/2700344, you may not be allowed to create manged/external tables outside their default allowed root location.

另请参阅关于 托管表与外部表的官方 Hive 文档

这篇关于与 hive 中的外部表混淆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆