无法将数据从s3复制到私有子网中的redshift集群 [英] Can not copy data from s3 to redshift cluster in a private subnet

查看:62
本文介绍了无法将数据从s3复制到私有子网中的redshift集群的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在私有子网中建立了一个Redshift集群.我可以成功连接到redshift集群,并通过DBeaver进行基本的SQL查询.

I have set up a redshift cluster in a private subnet. I can successfully connect to my redshift cluster and do basic SQL queries through DBeaver.

我还需要从s3上传一些文件到redshift,因此我在专用子网中设置了s3网关,并更新了专用子网的路由表以添加所需的路由,如下所示:

I need to upload some file from s3 to redshift as well, so I set up a s3 gateway in my private subnet and updated the route table for my private subnet to add the required route as follow:

Destination       Target                 Status            Propagated
192.168.0.0/16    local                  active            No   
pl-7ba54012 (com.amazonaws.us-east-2.s3, 52.219.80.0/20, 3.5.128.0/21, 52.219.96.0/20, 52.92.76.0/22)   vpce-04eed78f4db84ae49  
active             No   
0.0.0.0/0         nat-0a73ba7659e887232  active            No 

但是,我无法从s3存储桶中运行复制查询

However, I can not run the copy query from my s3 bucket

copy venue
from 's3://*****/tickit/venue_pipe.txt'
iam_role 'arn:aws:iam::******:role/global-dev-rdt-role-S3ReadonlyAccess'
region 'us-east-2';

我的存储桶以及公共子网和私有子网安全组上没有限制策略,并且我已经可以在私有子网中的redshift群集上运行SQL查询.

There is no restricitve policies on my bucket and public and private subnet security groups and I already can run SQL queries on my redshift cluster in a private subnet.

更新:Redshift群集的安全组允许所有连接到端口5439

update: the security group for redshift cluster allow all conection to port 5439

Type         Protocol      Port Range     Source      Description
Redshift     TCP           5439           0.0.0.0/0
Redshift     TCP           5439           ::/0
SSH          TCP           22             sg-0f933e18d6c1967b8

推荐答案

为重现您的情况,我执行了以下操作:

To reproduce your situation, I did the following:

  • 使用公共子网和私有子网创建新的 VPC (无NAT网关)
  • 在专用子网中启动一个1节点的 Amazon Redshift集群
    • 增强的VPC路由=否
    • 可公开访问=否

    工作成功,并显示以下消息:

    INFO:  Load into table 'foo' completed, 4 record(s) loaded successfully.
    

    因此,不需要VPC端点/NAT网关执行Redshift中的 COPY 命令.Redshift集群似乎有一个通过Redshift的后端"连接到S3的特殊方式.

    Therefore, a VPC Endpoint/NAT Gateway is not required to perform a COPY command from Redshift. The Redshift cluster has its own special way to connect to S3, seemingly via a Redshift 'backend'.

    如果从同一地区的Amazon S3加载数据,则流量将完全停留在AWS网络内.如果数据来自其他区域,则仍将对其进行加密,因为与Amazon S3的通信将通过HTTPS进行.

    If the data is being loaded from Amazon S3 in the same Region, then the traffic would stay wholly within the AWS network. If the data was coming from a different region, it would still be encrypted because communication with Amazon S3 would be via HTTPS.

    第二项测试:使用增强型VPC网络

    为反映您的情况,我启动了另一个启用了增强型VPC路由的Redshift集群.

    To mirror your situation, I launched a different Redshift cluster with Enhanced VPC routing enabled.

    当我运行 COPY 命令时,它可以挂起,因为我没有配置Redshift集群通过VPC访问Amazon S3的方式.

    When I ran the COPY command, it predictably hung because I did not configure a means for the Redshift cluster to access Amazon S3 via the VPC.

    然后,我为Amazon S3创建了 VPC端点,并通过完全访问"策略将其连接到专用子网.

    I then created a VPC Endpoint for Amazon S3 and connected it to the private subnet with a "Full Access" policy.

    然后,当我重新运行 COPY 命令时,它成功地从Amazon S3加载了数据.

    Then, when I re-ran the COPY command, it successfully loaded data from Amazon S3.

    底线:对我有用.您可能需要将配置与我执行的上述步骤进行比较.

    Bottom line: It worked for me. You might want to compare your configuration with the above steps that I took.

    这篇关于无法将数据从s3复制到私有子网中的redshift集群的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆