s3存储桶策略，例如，从两个不同的帐户读取 [英] s3 bucket policy for instance to read from two different accounts

查看：182 发布时间：2020/8/23 2:57:44 amazon-web-services apache-spark amazon-s3 amazon-iam

本文介绍了s3存储桶策略，例如，从两个不同的帐户读取的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个实例，需要从两个不同的帐户s3中读取数据.

I have a instance which needs to read data from two different account s3.

DataAccount 中的存储桶，其存储桶名称为" dataaccountlogs "
UserAccount 中的存储桶，其存储桶名称为" userlogs "

Bucket in DataAccount with bucket name "dataaccountlogs"
Bucket in UserAccount with bucket name "userlogs"

我对两个帐户都具有控制台访问权限，所以现在我需要配置存储桶策略，以允许实例从存储桶 dataaccountlogs 和 userlogs 中读取s3数据，而我的实例是在 UserAccount 中运行.

I have console access to both account, so now I need to configure bucket policy to allow instances to read s3 data from buckets dataaccountlogs and userlogs , and my instance is running in UserAccount .

我需要通过命令行以及使用 spark作业来访问这两个存储桶.

I need to access these two bucket both from command line as well as using spark job.

推荐答案

您将需要UserAccount中的一个角色，该角色将用于访问提到的存储桶，例如RoleA.角色应该具有必需的S3操作权限.

You will need a role in UserAccount, which will be used to access mentioned buckets, say RoleA. Role should have permissions for required S3 operations.

然后，您将能够为每个存储桶配置存储桶策略:

Then you will able to configure a bucket policy for each bucket:

对于 DataAccount :

{        
"Version": "2012-10-17",
"Id": "Policy1",
"Statement": [
    {
        "Sid": "test1",
        "Effect": "Allow",
        "Principal": {
            "AWS": "arn:aws:iam::DataAccount:role/RoleA"
        },
        "Action": "s3:*",
        "Resource": [
            "arn:aws:s3:::dataaccountlogs",
            "arn:aws:s3:::dataaccountlogs/*"
        ]
    }
]
}

对于 UserAccount :

{
"Version": "2012-10-17",
"Id": "Policy1",
"Statement": [
    {
        "Sid": "test1",
        "Effect": "Allow",
        "Principal": {
            "AWS": "arn:aws:iam::DataAccount:role/RoleA"
        },
        "Action": "s3:*",
        "Resource": [
            "arn:aws:s3:::userlogs",
            "arn:aws:s3:::userlogs/*"
        ]
    }
]
}

用于从命令行访问它们:

您将需要首先设置AWS CLI工具: https://docs.aws.amazon.com/polly/latest/dg/setup-aws-cli.html

You will need to setup AWS CLI tool first: https://docs.aws.amazon.com/polly/latest/dg/setup-aws-cli.html

然后，您将需要配置一个配置文件以使用您的角色. 首先，您需要为用户登录创建个人资料:

Then you will need to configure a profile for using your role. First you will need to make a profile for your user to login:

aws配置--profile YourProfileAlias

aws configure --profile YourProfileAlias

并按照说明设置凭据.

然后，您将需要编辑配置并添加角色的配置文件: 〜/.aws/config

Then you will need to edit config and add profile for a role: ~/.aws/config

最后添加一个块:

[profile YourRoleProfileName]
role_arn = arn:aws:iam::DataAccount:role/RoleA
source_profile = YourProfileAlias

之后，您将能够使用 aws s3api ... --profile YourRoleProfileName 代表创建的角色访问两个存储桶.

After that you will be able to use aws s3api ... --profile YourRoleProfileName to access your both buckets on behalf of created role.

要通过Spark访问:

如果在EMR上运行群集，则应使用SecurityConfiguration，并填写S3角色配置部分.可以为每个特定存储桶指定不同的角色.您应该使用前缀"约束，并在其后列出所有目标前缀.就像"s3://dataaccountlogs/，s3://userlogs"一样.

注意:为此，您应严格使用s3协议，而不是s3a.还有很多限制，您可以在这里找到: https://docs.aws. amazon.com/emr/latest/ReleaseGuide/emr-spark-s3-optimized-committer.html

Note: you should strictly use s3 protocol for this, not s3a. Also there is number of limitations, you can find here: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-s3-optimized-committer.html

使用spark的另一种方法是将Hadoop配置为承担您的角色.推杆

Another way with spark is to configure Hadoop to assume your role. Putting

spark.hadoop.fs.s3a.aws.credentials.provider = "org.apache.hadoop.fs.s3a.AssumedRoleCredentialProvider，org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider"

spark.hadoop.fs.s3a.aws.credentials.provider = "org.apache.hadoop.fs.s3a.AssumedRoleCredentialProvider,org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider"

并配置要使用的角色

spark.hadoop.fs.s3a.assumed.role.arn = arn:aws:iam :: DataAccount:role/RoleA

spark.hadoop.fs.s3a.assumed.role.arn = arn:aws:iam::DataAccount:role/RoleA

由于EMR提交者有各种局限性，因此这种方式现在更为通用.您可以在Hadoop文档中找到有关配置的更多信息: https://hadoop.apache. org/docs/r3.1.1/hadoop-aws/tools/hadoop-aws/assumed_roles.html

This way is more general now, since EMR commiter have various limitations. You can find more information for configuring this at Hadoop docs: https://hadoop.apache.org/docs/r3.1.1/hadoop-aws/tools/hadoop-aws/assumed_roles.html

这篇关于s3存储桶策略，例如，从两个不同的帐户读取的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

s3存储桶策略，例如，从两个不同的帐户读取 [英] s3 bucket policy for instance to read from two different accounts

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

s3存储桶策略，例如，从两个不同的帐户读取 [英] s3 bucket policy for instance to read from two different accounts

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭