将数据从S3复制到Redshift挂起 [英] Copying data from S3 to Redshift hangs

查看:165
本文介绍了将数据从S3复制到Redshift挂起的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近几天,我一直在尝试将数据加载到Redshift中,但没有成功.我已经为集群提供了正确的IAM角色,已经授予对S3的访问权限,我正在使用具有AWS凭据或IAM角色的COPY命令,但到目前为止没有成功.这可能是什么原因?到了我没有太多选择的余地.

I've been trying to load data into Redshift for the last couple of days with no success. I have provided the correct IAM role to the cluster, I have given access to S3, I am using the COPY command with either the AWS credentials or the IAM role and so far no success. What can be the reason for this? It has come to the point that I don't have many options left.

所以代码很基本,没有花哨的地方.见下文:

So the code is pretty basic, nothing fancy there. See below:

copy test_schema.test from 's3://company.test/tmp/append.csv.gz' 
iam_role 'arn:aws:iam::<rolenumber>/RedshiftCopyUnload'
delimiter ',' gzip;

我没有输入任何错误消息,因为没有错误消息.该代码只是挂起,我已经将其运行了40分钟以上,没有任何结果.如果我进入Redshift的查询"部分,则看不到任何异常.我正在使用Aginity和SQL Workbench运行查询.

I didn't put any error messages because there are none. The code simply hangs and I have left it running for well over 40 minutes with no results. If I go into the Queries section in Redshift I dont see any abnormal. I am using Aginity and SQL Workbench to run the queries.

我还尝试在Redshift中手动插入查询,并且似乎可行. COPY和UNLOAD不起作用,即使我创建了具有对S3的访问权限并与群集相关联的角色,我仍然遇到此问题.

I also tried to manually insert queries in Redshift and seems that works. COPY and UNLOAD do not work and even though I have created Roles with access to S3 and associated with the cluster I still get this problem.

有想法吗?

已找到解决方案.基本上,这是我们VPC中的连接问题.必须创建一个VPC端点并将其与Redshift使用的子网关联.

Solution has been found. Basically it was a connectivity problem within our VPC. A VPC endpoint had to be created and associated with the subnet used by Redshift.

推荐答案

我同意JohnRotenstein的观点,即那里需要更多信息来提供答案.我建议您采取简单的数据点和简单的表格. 这是分步解决方案,我希望这样做,您应该能够解决您的问题.

I agree with JohnRotenstein that, there needs more information to provide the answer. I would suggest you to take simple data points and simple table. Here are step-by-step solution, I hope by doing that, you should be able to resolve your issue.

假设这里是您的表结构.

Assume here is your table structure.

在这里,我正在处理大多数数据类型以证明我的观点. 创建表销售( salesid整数, 佣金十进制(8,2), 销售日期, 说明varchar(255), created_at时间戳默认系统日期, 时间戳);

Here I'm doing most of data types to prove my point. create table sales( salesid integer, commission decimal(8,2), saledate date, description varchar(255), created_at timestamp default sysdate, updated_at timestamp);

为了简单起见,这是您的数据文件驻留在S3中.
CSV(sales-example.txt)中的内容

Just to make it simple, here is your data file resides in S3.
Content in CSV(sales-example.txt)

salesid,commission,saledate,description,created_at,updated_at
1|3.55|2018-12-10|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
2|6.55|2018-01-01|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
4|7.55|2018-02-10|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
5|3.55||Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
7|3.50|2018-10-10|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51

使用psql终端或任何sql连接器运行以下两个命令.确保还运行第二个命令.

Run following two command using the psql terminal or any sql connector. Make sure to run second command as well.

copy sales(salesid,commission,saledate,description,created_at,updated_at) from 's3://example-bucket/foo/bar/sales-example.txt' credentials 'aws_access_key_id=************;aws_secret_access_key=***********' IGNOREHEADER  1;

commit;

我希望,这应该可以帮助您调试问题.

I hope, this should help you in debugging your issue.

这篇关于将数据从S3复制到Redshift挂起的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆