在 S3 上压缩整个目录 [英] Zip an entire directory on S3

查看:62
本文介绍了在 S3 上压缩整个目录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我在 S3 上有一个包含约 5000 个小文件的目录,有没有办法轻松压缩整个目录并将生成的 zip 文件留在 S3 上?我需要这样做,而不必自己手动访问每个文件.

If I have a directory with ~5000 small files on S3, is there a way to easily zip up the entire directory and leave the resulting zip file on S3? I need to do this without having to manually access each file myself.

谢谢!

推荐答案

不,没有灵丹妙药.

(顺便说一句,您必须意识到 S3 中没有目录"这样的东西.只有带有路径的对象.您可以获得类似目录的列表,但/"字符不是魔法 - 你可以得到任何你想要的字符的前缀.)

(As an aside, you have to realize that there is no such thing as a "directory" in S3. There are only objects with paths. You can get directory-like listings, but the '/' character isn't magic - you can get prefixes by any character you want.)

正如有人指出的那样,预压缩"它们可以帮助下载速度和附加速度.(以重复存储为代价.)

As someone pointed out, "pre-zipping" them can help both download speed and append speed. (At the expense of duplicate storage.)

如果下载是瓶颈,那听起来您正在串行下载.S3 可以支持 1000 个同时连接到同一个对象而不会出汗.您需要运行基准测试以查看最好的连接数,因为来自一个盒子的太多连接可能会受到 S3 的限制.当每秒连接 1000 个连接时,您可能需要进行一些TCP 调整.

If downloading is the bottleneck, it sounds like your are downloading serially. S3 can support 1000's of simultaneous connections to the same object without breaking a sweat. You'll need to run benchmarks to see how many connections are best, since too many connections from one box might get throttled by S3. And you may need to do some TCP tuning when doing 1000's of connections per second.

解决方案"在很大程度上取决于您的数据访问模式.尝试重新安排问题.如果您的单个文件下载不频繁,则将它们一次 100 个分组到 S3 中,然后在需要时将它们分开可能更有意义.如果它们是小文件,将它们缓存在文件系统上可能是有意义的.

The "solution" depends heavily on your data access patterns. Try re-arranging the problem. If your single-file downloads are infrequent, it might make more sense to group them 100 at a time into S3, then break them apart when requested. If they are small files, it might make sense to cache them on the filesystem.

或者将所有 5000 个文件作为一个大的 zip 文件存储在 S3 中可能是有意义的,并使用可以下载特定范围的 zip 文件的智能客户端"以便为单个文件提供服务.(我记得,S3 支持字节范围.)

Or it might make sense to store all 5000 files as one big zip file in S3, and use a "smart client" that can download specific ranges of the zip file in order to serve the individual files. (S3 supports byte ranges, as I recall.)

这篇关于在 S3 上压缩整个目录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆