Google云端:使用gsutil将数据从AWS S3下载到GCS [英] Google cloud: Using gsutil to download data from AWS S3 to GCS

查看:128
本文介绍了Google云端:使用gsutil将数据从AWS S3下载到GCS的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们的一个合作者已在AWS上提供了一些数据,我试图使用gsutil将其放入我们的Google云存储桶(只有一些文件对我们有用,所以我不想使用GUI由GCS提供).协作者已向我们提供了AWS桶ID,aws访问密钥ID和AWS秘密访问密钥ID.

One of our collaborators has made some data available on AWS and I was trying to get it into our google cloud bucket using gsutil (only some of the files are of use to us, so I don't want to use the GUI provided on GCS). The collaborators have provided us with the AWS bucket ID, the aws access key id, and aws secret access key id.

我仔细阅读了GCE上的文档,并编辑了〜/.botu文件,以便合并访问密钥.我重新启动终端并尝试执行"ls"操作,但出现以下错误:

I looked through the documentation on GCE and editied the ~/.botu file such that the access keys are incorporated. I restarted my terminal and tried to do an 'ls' but got the following error:

gsutil ls s3://cccc-ffff-03210/
AccessDeniedException: 403 AccessDenied
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied

我还需要配置/运行其他内容吗?

Do I need to configure/run something else too?

谢谢!

感谢您的答复!

我安装了Cloud SDK,并且可以在我的Google云存储项目上访问和运行所有gsutil命令.我的问题是尝试访问(例如,"ls"命令)与我共享的亚马逊S3.

I installed the Cloud SDK and I can access and run all gsutil commands on my google cloud storage project. My problem is in trying to access (e.g. 'ls' command) the amazon S3 that is being shared with me.

  1. 我在〜/.boto文件中取消了两行注释,并放置了访问密钥:

  1. I uncommented two lines in the ~/.boto file and put the access keys:

# To add HMAC aws credentials for "s3://" URIs, edit and uncomment the
# following two lines:
aws_access_key_id = my_access_key
aws_secret_access_key = my_secret_access_key


  1. "gsutil版本-l"的输出:

  1. Output of 'gsutil version -l':

| => gsutil version -l

my_gc_id
gsutil version: 4.27
checksum: 5224e55e2df3a2d37eefde57 (OK)
boto version: 2.47.0
python version: 2.7.10 (default, Oct 23 2015, 19:19:21) [GCC 4.2.1                                                 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)]
OS: Darwin 15.4.0
multiprocessing available: True
using cloud sdk: True
pass cloud sdk credentials to gsutil: True
config path(s): /Users/pc/.boto, /Users/pc/.config/gcloud/legacy_credentials/pc@gmail.com/.boto
gsutil path: /Users/pc/Documents/programs/google-cloud-        sdk/platform/gsutil/gsutil
compiled crcmod: True
installed via package manager: False
editable install: False


  1. 带有-DD选项的输出为:

  1. The output with the -DD option is:

=> gsutil -DD ls s3://my_amazon_bucket_id

multiprocessing available: True
using cloud sdk: True
pass cloud sdk credentials to gsutil: True
config path(s): /Users/pc/.boto, /Users/pc/.config/gcloud/legacy_credentials/pc@gmail.com/.boto
gsutil path: /Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gsutil
compiled crcmod: True
installed via package manager: False
editable install: False
Command being run: /Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=my_gc_id -DD ls s3://my_amazon_bucket_id
config_file_list: ['/Users/pc/.boto', '/Users/pc/.config/gcloud/legacy_credentials/pc@gmail.com/.boto']
config: [('debug', '0'), ('working_dir', '/mnt/pyami'), ('https_validate_certificates', 'True'), ('debug', '0'), ('working_dir', '/mnt/pyami'), ('content_language', 'en'), ('default_api_version', '2'), ('default_project_id', 'my_gc_id')]
DEBUG 1103 08:42:34.664643 provider.py] Using access key found in shared credential file.
DEBUG 1103 08:42:34.664919 provider.py] Using secret key found in shared credential file.
DEBUG 1103 08:42:34.665841 connection.py] path=/
DEBUG 1103 08:42:34.665967 connection.py] auth_path=/my_amazon_bucket_id/
DEBUG 1103 08:42:34.666115 connection.py] path=/?delimiter=/
DEBUG 1103 08:42:34.666200 connection.py] auth_path=/my_amazon_bucket_id/?delimiter=/
DEBUG 1103 08:42:34.666504 connection.py] Method: GET
DEBUG 1103 08:42:34.666589 connection.py] Path: /?delimiter=/
DEBUG 1103 08:42:34.666668 connection.py] Data: 
DEBUG 1103 08:42:34.666724 connection.py] Headers: {}
DEBUG 1103 08:42:34.666776 connection.py] Host: my_amazon_bucket_id.s3.amazonaws.com
DEBUG 1103 08:42:34.666831 connection.py] Port: 443
DEBUG 1103 08:42:34.666882 connection.py] Params: {}
DEBUG 1103 08:42:34.666975 connection.py] establishing HTTPS connection: host=my_amazon_bucket_id.s3.amazonaws.com, kwargs={'port': 443, 'timeout': 70}
DEBUG 1103 08:42:34.667128 connection.py] Token: None
DEBUG 1103 08:42:34.667476 auth.py] StringToSign:
GET


Fri, 03 Nov 2017 12:42:34 GMT
/my_amazon_bucket_id/
DEBUG 1103 08:42:34.667600 auth.py] Signature:
AWS RN8=
DEBUG 1103 08:42:34.667705 connection.py] Final headers: {'Date': 'Fri, 03 Nov 2017 12:42:34 GMT', 'Content-Length': '0', 'Authorization': u'AWS AK6GJQ:EFVB8F7rtGN8=', 'User-Agent': 'Boto/2.47.0 Python/2.7.10 Darwin/15.4.0 gsutil/4.27 (darwin) google-cloud-sdk/164.0.0'}
DEBUG 1103 08:42:35.179369 https_connection.py] wrapping ssl socket; CA certificate file=/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/third_party/boto/boto/cacerts/cacerts.txt
DEBUG 1103 08:42:35.247599 https_connection.py] validating server certificate: hostname=my_amazon_bucket_id.s3.amazonaws.com, certificate hosts=['*.s3.amazonaws.com', 's3.amazonaws.com']
send: u'GET /?delimiter=/ HTTP/1.1\r\nHost: my_amazon_bucket_id.s3.amazonaws.com\r\nAccept-Encoding: identity\r\nDate: Fri, 03 Nov 2017 12:42:34 GMT\r\nContent-Length: 0\r\nAuthorization: AWS AN8=\r\nUser-Agent: Boto/2.47.0 Python/2.7.10 Darwin/15.4.0 gsutil/4.27 (darwin) google-cloud-sdk/164.0.0\r\n\r\n'
reply: 'HTTP/1.1 403 Forbidden\r\n'
header: x-amz-bucket-region: us-east-1
header: x-amz-request-id: 60A164AAB3971508
header: x-amz-id-2: +iPxKzrW8MiqDkWZ0E=
header: Content-Type: application/xml
header: Transfer-Encoding: chunked
header: Date: Fri, 03 Nov 2017 12:42:34 GMT
header: Server: AmazonS3
DEBUG 1103 08:42:35.326652 connection.py] Response headers: [('date', 'Fri, 03 Nov 2017 12:42:34 GMT'), ('x-amz-id-2', '+iPxKz1dPdgDxpnWZ0E='), ('server', 'AmazonS3'), ('transfer-encoding', 'chunked'), ('x-amz-request-id', '60A164AAB3971508'), ('x-amz-bucket-region', 'us-east-1'), ('content-type', 'application/xml')]
DEBUG 1103 08:42:35.327029 bucket.py] <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>6097164508</RequestId><HostId>+iPxKzrWWZ0E=</HostId></Error>
DEBUG: Exception stack trace:
Traceback (most recent call last):
  File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 577, in _RunNamedCommandAndHandleExceptions
    collect_analytics=True)
  File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/command_runner.py", line 317, in RunNamedCommand
    return_code = command_inst.RunCommand()
  File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/commands/ls.py", line 548, in RunCommand
    exp_dirs, exp_objs, exp_bytes = ls_helper.ExpandUrlAndPrint(storage_url)
  File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/ls_helper.py", line 180, in ExpandUrlAndPrint
    print_initial_newline=False)
  File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/ls_helper.py", line 252, in _RecurseExpandUrlAndPrint
    bucket_listing_fields=self.bucket_listing_fields):
  File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/wildcard_iterator.py", line 476, in IterAll
    expand_top_level_buckets=expand_top_level_buckets):
  File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/wildcard_iterator.py", line 157, in __iter__
    fields=bucket_listing_fields):
  File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/boto_translation.py", line 413, in ListObjects
    self._TranslateExceptionAndRaise(e, bucket_name=bucket_name)
  File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/boto_translation.py", line 1471, in _TranslateExceptionAndRaise
    raise translated_exception
AccessDeniedException: AccessDeniedException: 403 AccessDenied


AccessDeniedException: 403 AccessDenied

推荐答案

我假设您能够使用 gcloud init gcloud auth login gcloud auth activate-service-account ,并且可以成功将对象列出/写入GCS.

I'll assume that you are able to set up gcloud credentials using gcloud init and gcloud auth login or gcloud auth activate-service-account, and can list/write objects to GCS successfully.

从那里,您需要两件事.正确配置的AWS IAM角色将应用于您正在使用的AWS用户,以及正确配置的~/.boto文件.

From there, you need two things. A properly configured AWS IAM role applied to the AWS user you're using, and a properly configured ~/.boto file.

必须通过授予用户角色或附加给用户的内联策略来应用这样的策略.

A policy like this must be applied, either by a role granted to your user or an inline policy attached to the user.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::some-s3-bucket/*",
                "arn:aws:s3:::some-s3-bucket"
            ]
        }
    ]
}

重要的部分是您具有ListBucketGetObject动作,并且这些动作的资源范围至少包括您希望读取的存储桶(或其前缀).

The important part is that you have ListBucket and GetObject actions, and the resource scope for these includes at least the bucket (or prefix thereof) that you wish to read from.

服务提供商之间的互操作总是有些棘手.在撰写本文时,为了支持 AWS签名V4 (所有AWS区域普遍支持的唯一签名),您必须在[s3]组中的凭证之外向~/.boto文件添加几个额外的属性.

Interoperation between service providers is always a bit tricky. At the time of this writing, in order to support AWS Signature V4 (the only one supported universally by all AWS regions), you have to add a couple extra properties to your ~/.boto file beyond just credential, in an [s3] group.

[Credentials]
aws_access_key_id = [YOUR AKID]
aws_secret_access_key = [YOUR SECRET AK]
[s3]
use-sigv4=True
host=s3.us-east-2.amazonaws.com

use-sigv4 属性通过gsutil提示Boto使用AWS签名V4的请求.当前,不幸的是,这要求在配置中指定主机.很容易弄清楚主机名,因为它遵循s3.[BUCKET REGION].amazonaws.com的模式.

The use-sigv4 property cues Boto, via gsutil, to use AWS Signature V4 for requests. Currently, this requires the host be specified in the configuration, unfortunately. It is pretty easy to figure the host name out, as it follows the pattern of s3.[BUCKET REGION].amazonaws.com.

如果您有来自多个S3区域的rsync/cp工作,则可以通过几种方法进行处理.您可以在运行命令以在多个文件之间进行更改之前,设置诸如BOTO_CONFIG的环境变量.或者,您可以使用顶级参数来覆盖每次运行的设置,例如:

If you have rsync/cp work from multiple S3 regions, you could handle it a few ways. You can set an environment variable like BOTO_CONFIG before running the command to change between multiple files. Or, you can override the setting on each run using a top-level argument, like:

gsutil -o s3:host=s3.us-east-2.amazonaws.com ls s3://some-s3-bucket

只需添加...完成此工作的另一种很酷的方法是 rclone .

Just want to add... another cool way to do this job is rclone.

这篇关于Google云端:使用gsutil将数据从AWS S3下载到GCS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆