为什么gsutil rsync重新下载我们所有的文件? [英] Why is gsutil rsync re-downloading all our files?

查看:804
本文介绍了为什么gsutil rsync重新下载我们所有的文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们一直在使用 gsutil -m rsync -r 使开发和部署框与GCS存储桶保持同步近2年,没有任何问题。在桶中大约有85k个对象。

We've been using gsutil -m rsync -r to keep dev and deploy boxes in sync with a GCS bucket for nearly 2 years without any problem. There are about 85k objects in the bucket.

直到最近,这个工作完美无缺:我们每隔15分钟左右运行一个部署框 - > GCS rsync,以便保留所有新上传的资源备份,然后每当我们想刷新本地开发数据(在OSX El Capitan上运行)时,使用GCS - > dev box rsync。

Until recently, this worked perfectly: we'd run a deploy-box -> GCS rsync every 15 mins or so, to keep all new uploaded resource backed up, and then a GCS -> dev box rsync whenever we wanted to refresh the local dev data (running on OSX El Capitan).

但最近几个月,GCS-> dev rsync已经开始膨胀,下载越来越多的图像。

Within the last couple of months, though, the GCS->dev rsync has started to bloat, downloading more and more images.

最初,我只是觉得很棒,我们是获得更多的资源上传,但它的数据增长速度比直到今天似乎正在下载整个85k图像。

Initially I just thought "great, we're getting more resources uploaded", but it's been growing way faster than the data, until today when it seems to be downloading the whole 85k images.

我仔细检查过我处在正确的位置,命令是正确的,路径是正确的等等。对于所有 gsutil 输出结果都是用复制和令牌滚动。 ..和Downloading ...消息,当我去另一个终端并运行 f时,可以很好地并行使用我们的100mbps连接ind。 -type f | wc -l <​​/ code>每隔10秒在目标目录上显示,只有2或3个新文件正在被添加一分钟。我查看gsutil表示现在正在下载的文件的修改时间,而且大多数文件都是旧的,很多文件在一年或更长时间内没有更改。意思是:它使用大量的时间和带宽来下载所有的数据,所有这些都是为了几百个文件。

I've double-checked I'm in the right place, the command is correct, the paths are correct, etc. For all that the gsutil output is scrolling by with reams and reams of "Copying..." and "Downloading..." messages, making good parallel use of our 100mbps connection, when I go to another terminal and run find . -type f | wc -l on the destination directory every 10 seconds, it shows that barely 2 or 3 new files are being added a minute. I look at modification times on files that gsutil says it's downloading right now and in the large majority they're old, plenty haven't changed in a year or more. Meaning: it's downloading all the data, using tons of time and bandwidth, all for the sake of a few hundred files.

在最近的OSX中有一些变化 gsutil 版本?有可能是一个错误?我怎么会开始关注这个问题呢?或报告?新闻组 gsutil-discuss gs-discussion 已存档,并且 gce -giscussion 都是关于使用GCE实例中的 gsutil

Has something changed in recent OSX gsutil versions? Is there possibly a bug? How would I even start to go about tracking this down? Or reporting it? The newsgroups gsutil-discuss and gs-discussion have been archived, and the talk in gce-discussion is all about using gsutil from GCE instances.

谢谢!

推荐答案

gsutil 4.20(发布2016-07-20)修改了更改检测算法。现在,它不仅比较本地文件的大小与其云对应文件的大小,而且还比较本地文件的大小和文件修改时间。使用rsync上传文件时,文件修改时间存储在文件的自定义用户元数据中。如果不存在,则使用对象创建时间。

gsutil 4.20 (released 2016-07-20) modified the change detection algorithm for rsync. Instead of comparing only the size of the local file with its cloud counterpart, it now compares both the size and file modification time of local files. The file modification time is stored in the custom user metadata for the file when it is uploaded with rsync. If that doesn't exist the object creation time is used.

这篇关于为什么gsutil rsync重新下载我们所有的文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆