使用 Redshift Spectrum 的步骤是什么? [英] What are the steps to use Redshift Spectrum.?

查看:17
本文介绍了使用 Redshift Spectrum 的步骤是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目前我使用 Amazon Redshift 和 Amazon S3 来存储数据.现在我想使用 Spectrum 来提高性能,但对如何正确使用它感到困惑.

Currently I am using Amazon Redshift as well as Amazon S3 to store data. Now I want to use Spectrum to improve performance but confused in how to use it properly.

如果我使用 SQL 工作台,我可以从它创建外部架构还是我需要从 AWS 控制台或 Athena 创建它.?

If I am using SQL workbench can I create external schema from same or I need to create it from AWS console or Athena.?

我是否需要为特定地区安装 Athena.?是否可以在没有 Athena 的情况下使用频谱.?

Do I need to have Athena for a specific region.? Is it possible to use spectrum without Athena.?

现在,如果我尝试通过 SQL 工作台创建外部架构,它会抛出错误CREATE EXTERNAL SCHEMA 未启用"如何启用此..?

Now if I try to create external schema through SQL workbench it was throwing an error "CREATE EXTERNAL SCHEMA is not enabled" How can enable this..?

如果有人使用过 Spectrum,请提供帮助,并让我知道使用频谱的详细步骤.

Please help if someone had used Spectrum and let me know detailed steps to use spectrum.

推荐答案

Redshift Spectrum 需要一个包含表定义的外部数据目录.正是这个数据目录包含对 S3 中文件的引用,而不是 Redshift 中的外部表定义.该数据目录可以在 Elastic MapReduce 中定义为 Hive 目录(如果您有现有的 EMR 部署,则很好)或在 Athena(如果您没有 EMR 或不想开始管理 Hadoop,则很好).如果您愿意,Athena 路线可以完全由 Redshift 管理.

Redshift Spectrum requires an external data catalog that contains the definition of the table. It is this data catalog that contains the reference to the files in S3, rather than the external table definition in Redshift. This data catalog can be defined in Elastic MapReduce as a Hive Catalog (good if you have an existing EMR deployment) or in Athena (good if you don't have EMR or don't want to get into managing Hadoop). The Athena route can be managed fully by Redshift, if you wish.

在我看来,您的问题是四件事之一.要么:

It looks to me like your issue is one of four things. Either:

  1. 您的 Redshift 集群不在当前支持 Athena 和 Spectrum 的 AWS 区域中.
  2. 您的 Redshift 集群版本尚不支持 Spectrum(1.0.1294 或更高版本).
  3. 您的 IAM 政策不允许 Redshift 控制 Athena.
  4. 您没有在 CREATE EXTERNAL SCHEMA 语句中使用 CREATE EXTERNAL DATABASE IF NOT EXISTS 参数.
  1. Your Redshift cluster is not in an AWS region that currently supports Athena and Spectrum.
  2. Your Redshift cluster version doesn't support Spectrum yet (1.0.1294 or later).
  3. Your IAM policies don't allow Redshift control over Athena.
  4. You're not using the CREATE EXTERNAL DATABASE IF NOT EXISTS parameter on your CREATE EXTERNAL SCHEMA statement.

要允许 Redshift 管理 Athena,您需要将 IAM 策略附加到您的 Redshift 集群,以允许它完全控制 Athena,以及对包含您的数据的 S3 存储桶的读取访问权限.

To allow Redshift to manage Athena you'll need to attach an IAM policy to your Redshift cluster that allows it Full Control over Athena, as well as Read access to the S3 bucket containing your data.

一旦到位,您就可以像之前一样创建外部架构,确保CREATE EXTERNAL DATABASE IF NOT EXISTS 参数也被传递.如果您没有预先存在的配置,这可以确保在 Athena 中创建外部数据库:http://docs.aws.amazon.com/redshift/latest/dg/c-getting-started-using-spectrum-create-外部表.html

Once that's in place, you can create your external schema as you have been already, ensuring that the CREATE EXTERNAL DATABASE IF NOT EXISTS argument is also passed. This makes sure that the external database is created in Athena if you don't have a pre-existing configuration: http://docs.aws.amazon.com/redshift/latest/dg/c-getting-started-using-spectrum-create-external-table.html

最后,运行您的 CREATE EXTERNAL TABLE 语句,这将在 Athena 数据目录中透明地创建表元数据:http://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-tables.html

Finally, run your CREATE EXTERNAL TABLE statement, which will transparently create the table metadata in the Athena data catalog: http://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-tables.html

这篇关于使用 Redshift Spectrum 的步骤是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆