将 Hive 表导出到 S3 存储桶 [英] Exporting Hive Table to a S3 bucket

查看:81
本文介绍了将 Hive 表导出到 S3 存储桶的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我通过 Elastic MapReduce 交互式会话创建了一个 Hive 表,并从一个 CSV 文件中填充它,如下所示:

I've created a Hive Table through an Elastic MapReduce interactive session and populated it from a CSV file like this:

CREATE TABLE csvimport(id BIGINT, time STRING, log STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '	';

LOAD DATA LOCAL INPATH '/home/hadoop/file.csv' OVERWRITE INTO TABLE csvimport;

我现在想将 Hive 表存储在 S3 存储桶中,以便在我终止 MapReduce 实例后保留该表.

I now want to store the Hive table in a S3 bucket so the table is preserved once I terminate the MapReduce instance.

有人知道怎么做吗?

推荐答案

是的,您必须在 hive 会话开始和结束时导出和导入数据

Yes you have to export and import your data at the start and end of your hive session

为此,您需要创建一个映射到 S3 存储桶和目录的表

To do this you need to create a table that is mapped onto S3 bucket and directory

CREATE TABLE csvexport (
  id BIGINT, time STRING, log STRING
  ) 
 row format delimited fields terminated by ',' 
 lines terminated by '
' 
 STORED AS TEXTFILE
 LOCATION 's3n://bucket/directory/';

将数据插入s3表,插入完成后目录会有一个csv文件

Insert data into s3 table and when the insert is complete the directory will have a csv file

 INSERT OVERWRITE TABLE csvexport 
 select id, time, log
 from csvimport;

您的表现已保留,当您创建新的配置单元实例时,您可以重新导入数据

Your table is now preserved and when you create a new hive instance you can reimport your data

您的表格可以以几种不同的格式存储,具体取决于您要使用的位置.

Your table can be stored in a few different formats depending on where you want to use it.

这篇关于将 Hive 表导出到 S3 存储桶的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆