AWS胶水从另一个AWS帐户访问/爬网dynamodb(跨帐户访问) [英] aws glue to access/crawl dynamodb from another aws account (cross account access)

查看:113
本文介绍了AWS胶水从另一个AWS帐户访问/爬网dynamodb(跨帐户访问)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个粘合作业,该作业导出DynamoDb表并将其以csv格式存储在S3上.胶粘作业和表位于相同的aws帐户中,但S3存储桶位于不同的aws帐户中.通过将以下存储桶策略附加到跨帐户S3存储桶,我可以从粘合作业访问它.

I have written a glue job which exports DynamoDb table and stores it on S3 in csv format. The glue job and the table are in the same aws account, but the S3 bucket is in a different aws account. I have been able to access cross account S3 bucket from the glue job by attaching the following bucket policy to it.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "tempS3Access",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<AWS-ACCOUNT-ID>:role/<ROLE-PATH>"
            },
            "Action": [
                "s3:Get*",
                "s3:Put*",
                "s3:List*",
                "s3:DeleteObject*"
            ],
            "Resource": [
                "arn:aws:s3:::<BUCKET-NAME>",
                "arn:aws:s3:::<BUCKET-NAME>/*"
            ]
        }
    ]
}

现在,我也想从另一个AWS帐户读取/访问DynamoDb表.是否可以使用Crawler访问跨帐户DynamoDb表?我需要实现什么?

Now, I also want to read/access DynamoDb table from another AWS account as well. Is it possible to access cross account DynamoDb table using Crawler ? What do I need to achieve this ?

谢谢

推荐答案

简短的回答:不能.搜寻器只能搜寻您自己帐户中的发电机表.

Short answer: You can't. The crawler can only crawl dynamo tables in your own account.

Looong回答:
您可以使用我的解决方法.

Looong answer:
You can use my workaround.

  1. 在帐户A中创建一个信任策略.您所做的将成为现实.
  2. 在您的帐户B中创建一个胶粘作业.导入boto3并在第一个帐户中创建一个会话.然后使用dynamodb.resource可以扫描表.查看我的代码:

import boto3 . 
sts_client = boto3.client('sts',region_name='your-region')  
assumed_role_object=sts_client.assume_role(RoleArn="arn:aws:iam::accountAid:role/the-role-you-created", RoleSessionName="AssumeRoleSession1")
credentials=assumed_role_object['Credentials']
dynamodb_client = boto3.resource(
    'dynamodb',
    aws_access_key_id=credentials['AccessKeyId'],
    aws_secret_access_key=credentials['SecretAccessKey'],
    aws_session_token=credentials['SessionToken'],
    region_name='your-region'
)  

table = dynamodb_client.Table('table-to-crawl')  

response = table.scan()  

data = response['Items']

现在有了这个数据",其中包含所有表元素,您可以做很多事情.如果您希望以某种方式处理数据,则可以创建一个dynamicFrame:

Now with this 'data', which holds all the table elements you can do a bunch of things. You can create a dynamicFrame if you wish to manipulate the data in some way:

dataF =gumContext.create_dynamic_frame.from_rdd(spark.sparkContext.parallelize(data),'data'))

dataF = glueContext.create_dynamic_frame.from_rdd(spark.sparkContext.parallelize(data),'data'))

如果需要的话,也可以使用dataFrame.
我希望这有帮助.如果您有任何疑问,请随时提问.

Or a dataFrame if that's what you need.
I hope this helps. If you have any questions feel free to ask.

这篇关于AWS胶水从另一个AWS帐户访问/爬网dynamodb(跨帐户访问)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆