如何使用Qubole Hive查询Amazon S3的gz文件中的数据? [英] How to query data from gz file of Amazon S3 using Qubole Hive query?
问题描述
我需要从gz获取特定数据.怎么写sql?可以将sql用作表数据库吗?:
I need get specific data from gz. how to write the sql? can I just sql as table database?:
Select * from gz_File_Name where key = 'keyname' limit 10.
,但是它总是返回错误.
but it always turn back with an error.
推荐答案
您需要在此文件位置(文件夹)上创建Hive外部表,以便能够使用Hive进行查询.Hive将识别gzip格式.像这样:
You need to create Hive external table over this file location(folder) to be able to query using Hive. Hive will recognize gzip format. Like this:
create external table hive_schema.your_table (
col_one string,
col_two string
)
stored as textfile --specify your file type, or use serde
LOCATION
's3://your_s3_path_to_the_folder_where_the_file_is_located'
;
在此处查看有关Hive表的手册: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableCreate/Drop/TruncateTable
See the manual on Hive table here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableCreate/Drop/TruncateTable
准确地说,s3并不存储文件夹,而s3中包含/s的文件名由诸如文件夹结构之类的Hive之类的不同工具表示.看到这里: https://stackoverflow.com/a/42877381/2700344
To be precise s3 under the hood does not store folders, filename containing /s in s3 represented by different tools such as Hive like a folder structure. See here: https://stackoverflow.com/a/42877381/2700344
这篇关于如何使用Qubole Hive查询Amazon S3的gz文件中的数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!