蜂巢中`load data inpath`和`location`之间的区别? [英] Difference between `load data inpath ` and `location` in hive?

查看:234
本文介绍了蜂巢中`load data inpath`和`location`之间的区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的公司,我看到这两个命令经常使用,我想知道它们之间的差异,因为它们的功能对我来说似乎相同:



1



  create table< mytable> 
(名称字符串,
数字双);

将数据inpath'/directory-path/file.csv'加载到< mytable>;



2



 创建表格< mytable> 
(名称字符串,
数字双);

location'/directory-path/file.csv';

它们都将HDFS目录中的数据复制到HIVE表格的目录中。使用这些应该有什么区别?谢谢。

解决方案

是的,它们用于不同的目的。



加载数据inpath 命令用于将数据加载到配置单元表中。 'LOCAL'表示输入文件在本地文件系统上。如果'LOCAL'被忽略,那么它会在HDFS中查找该文件。

 加载数据inpath'/ directory-path / file。 csv'放入< mytable>; 
将数据本地inpath'/local-directory-path/file.csv'加载到< mytable>;

LOCATION 关键字允许指向任何HDFS存储位置而不是存储在配置属性 hive.metastore.warehouse.dir 指定的文件夹中。换句话说,使用指定的 LOCATION'/ your-path /',Hive不会使用默认位置为这张桌子。如果您已经生成了数据,这一点很方便。



请记住,只能在EXTERNAL表上指定LOCATION。对于常规表,将使用默认位置。



总而言之,
加载数据inpath 告诉hive在何处查找输入文件和 LOCATION 关键字告诉hive将输出文件保存到HDFS的位置。

参考:
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL


At my firm, I see these two commands used frequently, and I'd like to be aware of the differences, because their functionality seems the same to me:

1

create table <mytable> 
(name string,
number double);

load data inpath '/directory-path/file.csv' into <mytable>; 

2

create table <mytable>
(name string,
number double);

location '/directory-path/file.csv';

They both copy the data from the directory on HDFS into the directory for the table on HIVE. Are there differences that one should be aware of when using these? Thank you.

解决方案

Yes, they are used for different purpose at all.

load data inpath command is use to load data into hive table. 'LOCAL' signifies that the input file is on the local file system. If 'LOCAL' is omitted then it looks for the file in HDFS.

load data inpath '/directory-path/file.csv' into <mytable>; 
load data local inpath '/local-directory-path/file.csv' into <mytable>;

LOCATION keyword allow to points to any HDFS location for its storage, rather than being stored in a folder specified by the configuration property hive.metastore.warehouse.dir.

In other words, with specified LOCATION '/your-path/', Hive does not use a default location for this table. This comes in handy if you already have data generated.

Remember, LOCATION can be specify on EXTERNAL tables only. For regular tables, default location will be used.

To summarize, load data inpath tell hive where to look for input files and LOCATION keyword tells hive where to save output files on HDFS.

References: https://cwiki.apache.org/confluence/display/Hive/GettingStarted https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

这篇关于蜂巢中`load data inpath`和`location`之间的区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆