s3存储桶策略,例如,从两个不同的帐户读取 [英] s3 bucket policy for instance to read from two different accounts
问题描述
我有一个实例,需要从两个不同的帐户s3中读取数据.
I have a instance which needs to read data from two different account s3.
- DataAccount 中的存储桶,其存储桶名称为" dataaccountlogs "
- UserAccount 中的存储桶,其存储桶名称为" userlogs "
- Bucket in DataAccount with bucket name "dataaccountlogs"
- Bucket in UserAccount with bucket name "userlogs"
我对两个帐户都具有控制台访问权限,所以现在我需要配置存储桶策略,以允许实例从存储桶 dataaccountlogs 和 userlogs 中读取s3数据,而我的实例是在 UserAccount 中运行.
I have console access to both account, so now I need to configure bucket policy to allow instances to read s3 data from buckets dataaccountlogs and userlogs , and my instance is running in UserAccount .
我需要通过命令行以及使用 spark作业来访问这两个存储桶.
I need to access these two bucket both from command line as well as using spark job.
推荐答案
您将需要UserAccount中的一个角色,该角色将用于访问提到的存储桶,例如RoleA.角色应该具有必需的S3操作权限.
You will need a role in UserAccount, which will be used to access mentioned buckets, say RoleA. Role should have permissions for required S3 operations.
然后,您将能够为每个存储桶配置存储桶策略:
Then you will able to configure a bucket policy for each bucket:
-
对于 DataAccount :
{
"Version": "2012-10-17",
"Id": "Policy1",
"Statement": [
{
"Sid": "test1",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::DataAccount:role/RoleA"
},
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::dataaccountlogs",
"arn:aws:s3:::dataaccountlogs/*"
]
}
]
}
对于 UserAccount :
{
"Version": "2012-10-17",
"Id": "Policy1",
"Statement": [
{
"Sid": "test1",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::DataAccount:role/RoleA"
},
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::userlogs",
"arn:aws:s3:::userlogs/*"
]
}
]
}
用于从命令行访问它们:
您将需要首先设置AWS CLI工具: https://docs.aws.amazon.com/polly/latest/dg/setup-aws-cli.html
You will need to setup AWS CLI tool first: https://docs.aws.amazon.com/polly/latest/dg/setup-aws-cli.html
然后,您将需要配置一个配置文件以使用您的角色. 首先,您需要为用户登录创建个人资料:
Then you will need to configure a profile for using your role. First you will need to make a profile for your user to login:
aws配置--profile YourProfileAlias
aws configure --profile YourProfileAlias
并按照说明设置凭据.
然后,您将需要编辑配置并添加角色的配置文件: 〜/.aws/config
Then you will need to edit config and add profile for a role: ~/.aws/config
最后添加一个块:
[profile YourRoleProfileName]
role_arn = arn:aws:iam::DataAccount:role/RoleA
source_profile = YourProfileAlias
之后,您将能够使用 aws s3api ... --profile YourRoleProfileName 代表创建的角色访问两个存储桶.
After that you will be able to use aws s3api ... --profile YourRoleProfileName to access your both buckets on behalf of created role.
要通过Spark访问:
- 如果在EMR上运行群集,则应使用SecurityConfiguration,并填写S3角色配置部分.可以为每个特定存储桶指定不同的角色.您应该使用前缀"约束,并在其后列出所有目标前缀.就像"s3://dataaccountlogs/,s3://userlogs"一样.
注意:为此,您应严格使用s3协议,而不是s3a.还有很多限制,您可以在这里找到: https://docs.aws. amazon.com/emr/latest/ReleaseGuide/emr-spark-s3-optimized-committer.html
Note: you should strictly use s3 protocol for this, not s3a. Also there is number of limitations, you can find here: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-s3-optimized-committer.html
- 使用spark的另一种方法是将Hadoop配置为承担您的角色.推杆
- Another way with spark is to configure Hadoop to assume your role. Putting
spark.hadoop.fs.s3a.aws.credentials.provider = "org.apache.hadoop.fs.s3a.AssumedRoleCredentialProvider,org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider"
spark.hadoop.fs.s3a.aws.credentials.provider = "org.apache.hadoop.fs.s3a.AssumedRoleCredentialProvider,org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider"
并配置要使用的角色
spark.hadoop.fs.s3a.assumed.role.arn = arn:aws:iam :: DataAccount:role/RoleA
spark.hadoop.fs.s3a.assumed.role.arn = arn:aws:iam::DataAccount:role/RoleA
由于EMR提交者有各种局限性,因此这种方式现在更为通用.您可以在Hadoop文档中找到有关配置的更多信息: https://hadoop.apache. org/docs/r3.1.1/hadoop-aws/tools/hadoop-aws/assumed_roles.html
This way is more general now, since EMR commiter have various limitations. You can find more information for configuring this at Hadoop docs: https://hadoop.apache.org/docs/r3.1.1/hadoop-aws/tools/hadoop-aws/assumed_roles.html
这篇关于s3存储桶策略,例如,从两个不同的帐户读取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!