Amazon S3上的Presto [英] Presto on Amazon S3

查看:103
本文介绍了Amazon S3上的Presto的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Amazon S3存储桶上使用Presto,但是在Internet上找不到很多相关信息.

我已经在微型实例上安装了Presto,但无法弄清楚如何连接到S3.有一个存储桶,其中有文件.我有一个正在运行的Hive Metastore服务器,并且已在presto hive.properties中对其进行了配置.但是,当我尝试在配置单元中运行LOCATION命令时,它不起作用.

IT抛出错误,提示找不到文件方案类型s3.

而且我也不知道为什么我们需要运行hadoop,但是如果没有hadoop,蜂巢就不会运行.有什么解释吗?

是我在设置过程中遵循的文档.

解决方案

Presto使用Hive元存储将数据库表映射到其基础文件.这些文件可以存在于S3上,并可以多种格式存储-CSV,ORC,Parquet,Seq等.

Hive Metastore通常通过发出DDL语句(例如 CREATE EXTERNAL TABLE ... LOCATION ... 子句)来通过HQL(Hive查询语言)进行填充.保存数据的基础文件.

为了使Presto连接到Hive Metastore,您需要编辑hive.properties文件(EMR将其放在/etc/presto/conf.dist/catalog/中)并进行设置适当的Hive Metastore服务的Thrift服务的 hive.metastore.uri 参数.

如果您选择Hive和Presto,则Amazon EMR集群实例将自动为您配置此项,因此这是一个不错的起点.

如果您想在独立的ec2实例上进行测试,那么我建议您首先专注于获得与Hadoop基础架构一起使用的功能性蜂巢服务.您应该能够定义在hdfs文件系统上本地驻留的表.Presto是对蜂巢的补充,但确实需要一个有效的蜂巢设置,Presto的本机ddl语句不像蜂巢那样完整,因此您将直接从蜂巢中进行大多数表的创建.

或者,您可以为mysql或postgresql数据库定义Presto连接器,但这只是一个jdbc传递,我认为您不会收获很多.

I'm trying to use Presto on Amazon S3 bucket, but haven't found much related information on the Internet.

I've installed Presto on a micro instance but I'm not able to figure out how I could connect to S3. There is a bucket and there are files in it. I have a running hive metastore server and I have configured it in presto hive.properties. But when I try to run the LOCATION command in hive, its not working.

IT throws an error saying cannot find the file scheme type s3.

And also I do not know why we need to run hadoop but without hadoop the hive doesnt run. Is there any explanation to this.

This and this are the documentations i've followed while set up.

解决方案

Presto uses the Hive metastore to map database tables to their underlying files. These files can exist on S3, and can be stored in a number of formats - CSV, ORC, Parquet, Seq etc.

The Hive metastore is usually populated through HQL (Hive Query Language) by issuing DDL statements like CREATE EXTERNAL TABLE ... with a LOCATION ... clause referencing the underlying files that hold the data.

In order to get Presto to connect to a Hive metastore you will need to edit the hive.properties file (EMR puts this in /etc/presto/conf.dist/catalog/) and set the hive.metastore.uri parameter to the thrift service of an appropriate Hive metastore service.

The Amazon EMR cluster instances will automatically configure this for you if you select Hive and Presto, so it's a good place to start.

If you want to test this on a standalone ec2 instance then I'd suggest that you first focus on getting a functional hive service working with the Hadoop infrastructure. You should be able to define tables that reside locally on the hdfs file system. Presto complements hive, but does require a functioning hive set-up, presto's native ddl statements are not as feature complete as hive, so you'll do most table creation from hive directly.

Alternatively, you can define Presto connectors for a mysql or postgresql database, but it's just a jdbc pass through do I don't think you'll gain much.

这篇关于Amazon S3上的Presto的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆