拉链在S3上的整个目录 [英] Zip an entire directory on S3

查看:138
本文介绍了拉链在S3上的整个目录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有S3与〜5000小文件的目录,有没有办法可以轻松地压缩了整个目录,并留下S3生成的压缩文件?我需要做到这一点,而无需手动访问每个文件自己。

If I have a directory with ~5000 small files on S3, is there a way to easily zip up the entire directory and leave the resulting zip file on S3? I need to do this without having to manually access each file myself.

谢谢!

推荐答案

没有,也没有灵丹妙药。

No, there is no magic bullet.

(顺便说一句,你必须认识到,有没有这样的事情在S3中一个目录,有只与路径的对象。你可以得到类似于目录的列表,但/字符不是魔术 - 您可以通过任何你想要的性格得到prefixes)

(As an aside, you have to realize that there is no such thing as a "directory" in S3. There are only objects with paths. You can get directory-like listings, but the '/' character isn't magic - you can get prefixes by any character you want.)

正如有人指出的那样,pre-荏苒他们可以帮助双方的下载速度和追加的速度。 (在重复存储为代价。)

As someone pointed out, "pre-zipping" them can help both download speed and append speed. (At the expense of duplicate storage.)

如果下载是瓶颈,这听起来像你正在下载顺序。 S3可以支持1000的同一个对象同时连接不费吹灰之力。你需要运行基准测试,看看有多少连接是最好的,因为从一箱连接过多可能是由S3得到节流。你可能需要做每秒连接时,1000年做了一些 TCP调整

If downloading is the bottleneck, it sounds like your are downloading serially. S3 can support 1000's of simultaneous connections to the same object without breaking a sweat. You'll need to run benchmarks to see how many connections are best, since too many connections from one box might get throttled by S3. And you may need to do some TCP tuning when doing 1000's of connections per second.

解决方案在很大程度上取决于你的数据访问模式。尝试重新安排的问题。如果你的单文件下载并不多见,它可能会更有意义将它们分组100的时间为S3,那么当要求打破他们分开。如果他们是小文件,它可能是有意义的缓存它们的文件系统。

The "solution" depends heavily on your data access patterns. Try re-arranging the problem. If your single-file downloads are infrequent, it might make more sense to group them 100 at a time into S3, then break them apart when requested. If they are small files, it might make sense to cache them on the filesystem.

或者,它可能是有意义的存储所有5000个文件在S3一个大的压缩文件,并使用智能客户端,可以以服务的单个文件下载的zip文件的具体范围。 (S3支持字节范围,我记得。)

Or it might make sense to store all 5000 files as one big zip file in S3, and use a "smart client" that can download specific ranges of the zip file in order to serve the individual files. (S3 supports byte ranges, as I recall.)

这篇关于拉链在S3上的整个目录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆