Hive外部表与内部表命令 [英] Hive External Table vs Internal table commands

查看:497
本文介绍了Hive外部表与内部表命令的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有这两张表:



外部

<$ p $创建外部表emp_feedback(
emp_id int,
emp_name字符串

LOCATION'/user/hive/warehouse/mydb.db/contacts' ;

内部

  create table emp_feedback(
emp_id int,
emp_name string

LOAD DATA INPATH'file_location_of_csv'INTO TABLE emp_feedback;




  1. 当我说: LOCATION'/ user /hive/warehouse/mydb.db/contacts'; 对于外部表来说,这意味着该表的数据可以在目录'/ user / hive / warehouse /mydb.db/contacts'; ?所以这个目录必须在HDFS中存在吗?

  2. 我可以使用 LOAD DATA INPATH ... 作为 external 表或者仅用于内部表。反之亦然可以使用 Location ... 作为内部表吗?


解决方案


  1. (a)是的。你是对的,这意味着数据在该位置/目录中找到 (b)不需要该目录存在以创建一个模式,Hive会如果该目录不存在,则创建该目录。但没有意义,因为你的表将是空的,因此你的查询将是空的。但在将来,您可以将数据移动到该位置并使用该表。
  2. c $ c>可以用于外部和内部表格。当你这样做时,它会将数据移动到架构指定的位置(对于外部表),或者移动到 /.../ warehouse /...(对于内部表)可以为内部表和外部表指定
  3. (b) location 。但是,如果删除内部表,它也会从该位置删除数据,而只有元数据信息才会从外部表中删除。


Assuming I have these two tables:

External:

create external table emp_feedback (
  emp_id int,
  emp_name string
)
LOCATION '/user/hive/warehouse/mydb.db/contacts';

Internal:

create table emp_feedback (
  emp_id int,
  emp_name string
)
LOAD DATA INPATH 'file_location_of_csv' INTO TABLE emp_feedback;

  1. When I say: LOCATION '/user/hive/warehouse/mydb.db/contacts'; for the external table does that mean that the data for that table is found in the directory '/user/hive/warehouse/mydb.db/contacts';? So that directory has to exist in HDFS before hand?
  2. Can I use LOAD DATA INPATH... for an external table or is that only used for internal tables. And vice versa can I use Location... for an internal table?

解决方案

  1. (a) Yes. You are right, it means that the data is found in that location/directory
  2. (b) No. The directory doesn't have to exist to create a Schema, Hive will create the directory if it doesn't exist. But there is no point in doing as your table will be empty therefore your query will be empty. But in the future, you can move data to that location and use that table.

  3. (a) LOAD DATA INPATH can be used for both external and internal tables. When you do, it moves the data, to the location specified by the schema (for external tables) or to /.../warehouse/... (for internal tables)

  4. (b) location can be specified for both internal and external tables. But when you drop the internal table, it will also remove the data from that location, whereas only meta data information is removed for external tables.

这篇关于Hive外部表与内部表命令的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆