与蜂巢中的外部表混淆 [英] Confusion with the external tables in hive
问题描述
我使用以下命令创建了配置单元外部表:
I have created the hive external table using below command:
use hive2;
create external table depTable (depId int comment 'This is the unique id for each dep', depName string,location string) comment 'department table' row format delimited fields terminated by ","
stored as textfile location '/dataDir/';
现在,当我查看HDFS时,我可以看到数据库,但是仓库内没有depTable
.
Now, when I view the HDFS I can see the db but there is no depTable
inside the warehouse.
[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse/hive2.db
[cloudera@quickstart ~]$
上面您可以看到该数据库中没有创建表.据我所知,外部表没有存储在配置单元仓库中.如果是,那么它将存储在哪里??
但是,如果我先创建外部表然后加载数据,则可以看到hive2.db
中的文件.
Above you can see that there is no table created in this DB. As far as I know, external tables are not stored in the hive warehouse.So am I correct ?? If yes then where is it stored ??
But if I create external table first and then load the data then I am able to see the file inside hive2.db
.
hive> create external table depTable (depId int comment 'This is the unique id for each dep', depName string,location string) comment 'department table' row format delimited fields terminated by "," stored as textfile;
OK
Time taken: 0.056 seconds
hive> load data inpath '/dataDir/department_data.txt' into table depTable;
Loading data to table default.deptable
Table default.deptable stats: [numFiles=1, totalSize=90]
OK
Time taken: 0.28 seconds
hive> select * from deptable;
OK
1001 FINANCE SYDNEY
2001 AUDIT MELBOURNE
3001 MARKETING PERTH
4001 PRODUCTION BRISBANE
现在,如果我触发hadoop fs
查询,我可以在数据库下看到此表,如下所示:
Now, if I fire the hadoop fs
query I can see this table under database as below:
[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse/hive2.db
Found 1 items
drwxrwxrwx - cloudera supergroup 0 2019-01-17 09:07 /user/hive/warehouse/hive2.db/deptable
如果仍然删除该表,则可以在HDFS中看到该表,如下所示:
If I delete the table still I am able to see table in the HDFS as below:
[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse/hive2.db
Found 1 items
drwxrwxrwx - cloudera supergroup 0 2019-01-17 09:11 /user/hive/warehouse/hive2.db/deptable
那么,外部表的确切行为是什么?当我使用LOCATION
关键字创建时,将在何处存储它;当我使用load
语句创建时,为什么将其存储在HDFS
中;以及删除后为什么不将其删除.
So, what is the exact behavior of the external tables ?? When I create using LOCATION
keyword where does it get stored and when I create using load
statement why it is getting stored in the HDFS
and after deleting why it doesn't get deleted.
推荐答案
EXTERNAL
和MANAGED
表之间的主要区别在于Drop表/分区行为.
删除MANAGED
表/分区时,带有数据文件的位置也将被删除.
拖放EXTERNAL
表时,包含数据文件的位置保持不变.
The main difference between EXTERNAL
and MANAGED
tables is in Drop table/partition behavior.
When you drop MANAGED
table/partition, the location with data files also removed.
When you drop EXTERNAL
table, the location with data files remains as is.
更新: TBLPROPERTIES ("external.table.purge"="true")
在4.0.0及更高版本中(
UPDATE: TBLPROPERTIES ("external.table.purge"="true")
in release 4.0.0+ (HIVE-19981) when set on external table would delete the data as well.
EXTERNAL
表和MANAGED
被存储在DDL中指定的位置.您可以在现有位置的顶部创建表,并在该位置已有数据文件,该表对EXTERNAL
或MANAGED
均适用,这无关紧要.
EXTERNAL
table as well as MANAGED
is being stored in the location specified in DDL. You can create table on top of existing location with data files already in the location and it will work for both EXTERNAL
or MANAGED
, does not matter.
您甚至可以在同一位置的顶部同时创建EXTERNAL
和MANAGED
表,请参见此答案以获取更多详细信息和测试:
You even can create both EXTERNAL
and MANAGED
tables on top of the same location, see this answer with more details and tests: https://stackoverflow.com/a/54038932/2700344
如果您指定位置,则两种类型的表的数据都将存储在该位置.如果未指定位置,则数据将位于默认位置:/user/hive/warehouse/database_name.db/table_name
对于托管表和外部表.
If you specified location, the data will be stored in that location for both types of tables. If you did not specify location, the data will be in default location: /user/hive/warehouse/database_name.db/table_name
for both managed and external tables.
另请参见
这篇关于与蜂巢中的外部表混淆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!