s3存储桶策略,例如,从两个不同的帐户读取 [英] s3 bucket policy for instance to read from two different accounts

查看:182
本文介绍了s3存储桶策略,例如,从两个不同的帐户读取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个实例,需要从两个不同的帐户s3中读取数据.

I have a instance which needs to read data from two different account s3.

  1. DataAccount 中的存储桶,其存储桶名称为" dataaccountlogs "
  2. UserAccount 中的存储桶,其存储桶名称为" userlogs "
  1. Bucket in DataAccount with bucket name "dataaccountlogs"
  2. Bucket in UserAccount with bucket name "userlogs"

我对两个帐户都具有控制台访问权限,所以现在我需要配置存储桶策略,以允许实例从存储桶 dataaccountlogs userlogs 中读取s3数据,而我的实例是在 UserAccount 中运行.

I have console access to both account, so now I need to configure bucket policy to allow instances to read s3 data from buckets dataaccountlogs and userlogs , and my instance is running in UserAccount .

我需要通过命令行以及使用 spark作业来访问这两个存储桶.

I need to access these two bucket both from command line as well as using spark job.

推荐答案

您将需要UserAccount中的一个角色,该角色将用于访问提到的存储桶,例如RoleA.角色应该具有必需的S3操作权限.

You will need a role in UserAccount, which will be used to access mentioned buckets, say RoleA. Role should have permissions for required S3 operations.

然后,您将能够为每个存储桶配置存储桶策略:

Then you will able to configure a bucket policy for each bucket:

  1. 对于 DataAccount :

{        
"Version": "2012-10-17",
"Id": "Policy1",
"Statement": [
    {
        "Sid": "test1",
        "Effect": "Allow",
        "Principal": {
            "AWS": "arn:aws:iam::DataAccount:role/RoleA"
        },
        "Action": "s3:*",
        "Resource": [
            "arn:aws:s3:::dataaccountlogs",
            "arn:aws:s3:::dataaccountlogs/*"
        ]
    }
]
}

  • 对于 UserAccount :

    {
    "Version": "2012-10-17",
    "Id": "Policy1",
    "Statement": [
        {
            "Sid": "test1",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::DataAccount:role/RoleA"
            },
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::userlogs",
                "arn:aws:s3:::userlogs/*"
            ]
        }
    ]
    } 
    

  • 用于从命令行访问它们:

    您将需要首先设置AWS CLI工具: https://docs.aws.amazon.com/polly/latest/dg/setup-aws-cli.html

    You will need to setup AWS CLI tool first: https://docs.aws.amazon.com/polly/latest/dg/setup-aws-cli.html

    然后,您将需要配置一个配置文件以使用您的角色. 首先,您需要为用户登录创建个人资料:

    Then you will need to configure a profile for using your role. First you will need to make a profile for your user to login:

    aws配置--profile YourProfileAlias

    aws configure --profile YourProfileAlias

    并按照说明设置凭据.

    然后,您将需要编辑配置并添加角色的配置文件: 〜/.aws/config

    Then you will need to edit config and add profile for a role: ~/.aws/config

    最后添加一个块:

    [profile YourRoleProfileName]
    role_arn = arn:aws:iam::DataAccount:role/RoleA
    source_profile = YourProfileAlias
    

    之后,您将能够使用 aws s3api ... --profile YourRoleProfileName 代表创建的角色访问两个存储桶.

    After that you will be able to use aws s3api ... --profile YourRoleProfileName to access your both buckets on behalf of created role.

    要通过Spark访问:

    1. 如果在EMR上运行群集,则应使用SecurityConfiguration,并填写S3角色配置部分.可以为每个特定存储桶指定不同的角色.您应该使用前缀"约束,并在其后列出所有目标前缀.就像"s3://dataaccountlogs/,s3://userlogs"一样.

    注意:为此,您应严格使用s3协议,而不是s3a.还有很多限制,您可以在这里找到: https://docs.aws. amazon.com/emr/latest/ReleaseGuide/emr-spark-s3-optimized-committer.html

    Note: you should strictly use s3 protocol for this, not s3a. Also there is number of limitations, you can find here: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-s3-optimized-committer.html

    1. 使用spark的另一种方法是将Hadoop配置为承担您的角色.推杆
    1. Another way with spark is to configure Hadoop to assume your role. Putting

    spark.hadoop.fs.s3a.aws.credentials.provider = "org.apache.hadoop.fs.s3a.AssumedRoleCredentialProvider,org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider"

    spark.hadoop.fs.s3a.aws.credentials.provider = "org.apache.hadoop.fs.s3a.AssumedRoleCredentialProvider,org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider"

    并配置要使用的角色

    spark.hadoop.fs.s3a.assumed.role.arn = arn:aws:iam :: DataAccount:role/RoleA

    spark.hadoop.fs.s3a.assumed.role.arn = arn:aws:iam::DataAccount:role/RoleA

    由于EMR提交者有各种局限性,因此这种方式现在更为通用.您可以在Hadoop文档中找到有关配置的更多信息: https://hadoop.apache. org/docs/r3.1.1/hadoop-aws/tools/hadoop-aws/assumed_roles.html

    This way is more general now, since EMR commiter have various limitations. You can find more information for configuring this at Hadoop docs: https://hadoop.apache.org/docs/r3.1.1/hadoop-aws/tools/hadoop-aws/assumed_roles.html

    这篇关于s3存储桶策略,例如,从两个不同的帐户读取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆