计算引擎使用gsutil下载tgz文件时出现crcmod错误 [英] compute engine use gsutil to download tgz file has crcmod error

查看:63
本文介绍了计算引擎使用gsutil下载tgz文件时出现crcmod错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现如果您创建计算引擎(CentOS或Debian)计算机,并使用gsutil下载(cp)tgz文件将导致crcmod错误...

I find if you create a compute engine (CentOS or Debian) machine and using gsutil to download (cp) a tgz file will cause a crcmod error...

$ gsutil cp gs://mybucket/data.tgz .
Copying gs://mybucket/data.tgz...
CommandException:
Downloading this composite object requires integrity checking with CRC32c, but
your crcmod installation isn't using the module's C extension, so the the hash
computation will likely throttle download performance. For help installing the
extension, please see:
  $ gsutil help crcmod
To download regardless of crcmod performance or to skip slow integrity checks,
see the "check_hashes" option in your boto config file.

当前,我使用"check_hashes = never"来绕过检查...

Currently I use "check_hashes = never" to bypass the check...

$ vi /etc/boto.cfg
[GSUtil]
default_project_id = 429100748693
default_api_version = 2
check_hashes = never
...

但是,根本原因是什么?有什么好的解决方案可以解决这个问题吗?

But, what is the root cause? and is there any good solution to solve the problem?

推荐答案

您要下载的对象是复合对象,这基本上意味着它是在并行块中上传的.上传大于150M(可配置的阈值)的对象时,gsutil会自动执行此操作,以提供更好的性能.

The object you're trying to download is a composite object, which basically means it was uploaded in parallel chunks. gsutil automatically does this when uploading objects larger than 150M (a configurable threshold), to provide better performance.

复合对象仅具有crc32c校验和(无MD5),因此为了在下载复合对象时验证数据完整性,gsutil需要执行crc32c校验和.不幸的是,使用Python分发的库不包含已编译的crc32c实现,因此,除非您安装已编译的crc32c,否则gsutil将使用非编译的crc32c Python实现,这非常慢.该警告已打印出来,以通知您有解决此性能问题的方法:请运行:

Composite objects only have a crc32c checksum (no MD5), so in order to validate data integrity when downloading composite objects, gsutil needs to perform a crc32c checksum. Unfortunately, the libraries distributed with Python don't include a compiled crc32c implementation, so unless you install a compiled crc32c, gsutil will use a non-compiled Python implementation of crc32c that's quite slow. That warning is printed to let you know there's a way to fix that performance problem: Please run:

gsutil help crcmod

,并按照那里的说明安装编译的crc32c.做到这一点很容易,值得付出努力.

and follow the instructions there for installing a compiled crc32c. It's pretty easy to do it, and worth the effort.

另一个注意事项:我强烈建议您不要在Boto配置文件中设置check_hashes = never.这将禁用完整性检查,这意味着您的下载可能会损坏,而您可能会不知道.您希望启用数据完整性检查,以确保您使用的是正确的数据.

One other note: I strongly recommend against setting check_hashes = never in your boto config file. That will disable integrity checking, which means it's possible your download could get corrupted and you wouldn't know it. You want data integrity checking enabled to ensure you're working with correct data.

这篇关于计算引擎使用gsutil下载tgz文件时出现crcmod错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆