为Presto和AWS S3设置独立的Hive Metastore服务 [英] Setup Standalone Hive Metastore Service For Presto and AWS S3

查看:257
本文介绍了为Presto和AWS S3设置独立的Hive Metastore服务的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的工作环境是将S3服务用作数据湖,而没有AWS Athena.我正在尝试设置Presto以能够查询S3中的数据,并且我知道我需要通过Hive Metastore服务将数据结构定义为Hive表.我正在Docker中部署每个组件,所以我想使容器的尺寸尽可能的小.仅需要运行Metastore服务,我需要Hive的哪些组件?我实际上并不真正在乎运行Hive,而只是在关心Metastore.我可以减少所需的东西,还是已经为此预先配置了软件包?我没有找到任何在线内容,其中不包括下载所有Hadoop和Hive.我想做的事有可能吗?

I'm working in an environment where I have an S3 service being used as a data lake, but not AWS Athena. I'm trying to setup Presto to be able to query the data in S3 and I know I need the define the data structure as Hive tables through the Hive Metastore service. I'm deploying each component in Docker, so I'd like to keep the container size as minimal as possible. What components from Hive do I need to be able to just run the Metastore service? I don't really actually care about running Hive, just the Metastore. Can I trim down what's needed, or is there already a pre-configured package just for that? I haven't been able to find anything online that doesn't include downloading all of Hadoop and Hive. Is what I'm trying to do possible?

推荐答案

有一种解决方法,您不需要配置单元即可运行presto.但是我还没有尝试过像s3这样的任何分布式文件系统,但是代码建议它应该可以工作(至少与HDFS一起工作).我认为值得尝试,因为您根本不需要任何新的docker映像来配置单元.

There is a workaround, that you do not need hive to run presto. However I haven't tried that with any distributed file system like s3, but code suggest it should work (at least with HDFS). In my opinion it is worth trying, because you do not need any new docker image for hive at all.

这个想法是使用内置的

The idea is to use a builtin FileHiveMetastore. It is neither documented nor advised to be used in production but you could play with it. Schema information is stored next to the data in the file system. Obviously, it has its prons and cons. I do not know the details of your use case, so I don't know if it fits your needs.

配置:

connector.name=hive-hadoop2
hive.metastore=file
hive.metastore.catalog.dir=file:///tmp/hive_catalog
hive.metastore.user=cox

演示:

presto:tiny> create schema hive.default;
CREATE SCHEMA
presto:tiny> use hive.default;
USE
presto:default> create table t (t bigint);
CREATE TABLE
presto:default> show tables;
 Table
-------
 t
(1 row)

Query 20180223_202609_00009_iuchi, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:00 [1 rows, 18B] [11 rows/s, 201B/s]

presto:default> insert into t (values 1);
INSERT: 1 row

Query 20180223_202616_00010_iuchi, FINISHED, 1 node
Splits: 51 total, 51 done (100.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]

presto:default> select * from t;
 t
---
 1
(1 row)

完成上述操作后,我可以在我的机器上找到以下内容:

After the above I was able to find the following on my machine:

/tmp/hive_catalog/
/tmp/hive_catalog/default
/tmp/hive_catalog/default/t
/tmp/hive_catalog/default/t/.prestoPermissions
/tmp/hive_catalog/default/t/.prestoPermissions/user_cox
/tmp/hive_catalog/default/t/.prestoPermissions/.user_cox.crc
/tmp/hive_catalog/default/t/.20180223_202616_00010_iuchi_79dee041-58a3-45ce-b86c-9f14e6260278.crc
/tmp/hive_catalog/default/t/.prestoSchema
/tmp/hive_catalog/default/t/20180223_202616_00010_iuchi_79dee041-58a3-45ce-b86c-9f14e6260278
/tmp/hive_catalog/default/t/..prestoSchema.crc
/tmp/hive_catalog/default/.prestoSchema
/tmp/hive_catalog/default/..prestoSchema.crc

这篇关于为Presto和AWS S3设置独立的Hive Metastore服务的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆