从技术上讲,s3n、s3a 和 s3 之间有什么区别? [英] Technically what is the difference between s3n, s3a and s3?

查看:46
本文介绍了从技术上讲,s3n、s3a 和 s3 之间有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道 https://wiki.apache.org/hadoop/AmazonS3 和以下单词:

I'm aware of the existence of https://wiki.apache.org/hadoop/AmazonS3 and the following words:

S3 原生文件系统(URI 方案:s3n)一个用于在 S3 上读写常规文件的原生文件系统.此文件系统的优点是您可以访问 S3 上使用其他工具编写的文件.相反,其他工具可以访问使用 Hadoop 编写的文件.缺点是 S3 对文件大小的限制为 5GB.

S3 Native FileSystem (URI scheme: s3n) A native filesystem for reading and writing regular files on S3. The advantage of this filesystem is that you can access files on S3 that were written with other tools. Conversely, other tools can access files written using Hadoop. The disadvantage is the 5GB limit on file size imposed by S3.

S3A(URI 方案:s3a) S3 Native 的后继者,s3n fs,S3a: 系统使用亚马逊的库与 S3 交互.这允许 S3a 支持更大的文件(不再有 5GB 限制)、更高性能的操作等等.该文件系统旨在替代/继承 S3 Native:从 s3n://URL 访问的所有对象也应该可以通过替换 URL 模式从 s3a 访问.

S3A (URI scheme: s3a) A successor to the S3 Native, s3n fs, the S3a: system uses Amazon's libraries to interact with S3. This allows S3a to support larger files (no more 5GB limit), higher performance operations and more. The filesystem is intended to be a replacement for/successor to S3 Native: all objects accessible from s3n:// URLs should also be accessible from s3a simply by replacing the URL schema.

S3 块文件系统(URI 方案:s3)由 S3 支持的基于块的文件系统.文件以块的形式存储,就像它们在 HDFS 中一样.这允许有效地实现重命名.此文件系统要求您为文件系统指定一个存储桶 - 您不应使用包含文件的现有存储桶,或将其他文件写入同一个存储桶.此文件系统存储的文件可以大于 5GB,但无法与其他 S3 工具互操作.

S3 Block FileSystem (URI scheme: s3) A block-based filesystem backed by S3. Files are stored as blocks, just like they are in HDFS. This permits efficient implementation of renames. This filesystem requires you to dedicate a bucket for the filesystem - you should not use an existing bucket containing files, or write other files to the same bucket. The files stored by this filesystem can be larger than 5GB, but they are not interoperable with other S3 tools.

为什么 URI 上的字母更改会产生如此大的差异?例如

Why a letter change on the URI could make such difference? For example

val data = sc.textFile("s3n://bucket-name/key")

val data = sc.textFile("s3a://bucket-name/key")

这种变化背后的技术差异是什么?有什么好文章可以阅读吗?

What is the technical difference underlying this change? Are there any good articles that I can read on this?

推荐答案

URI 方案上的字母变化有很大的不同,因为它会导致使用不同的软件来连接 S3.有点像 http 和 https 的区别——只是一个字母的变化,但却引发了巨大的行为差异.

The letter change on the URI scheme makes a big difference because it causes different software to be used to interface to S3. Somewhat like the difference between http and https - it's only a one-letter change, but it triggers a big difference in behavior.

s3 和 s3n/s3a 之间的区别在于 s3 是基于块的覆盖在 Amazon S3 之上,而 s3n/s3a 不是(它们是基于对象的).

The difference between s3 and s3n/s3a is that s3 is a block-based overlay on top of Amazon S3, while s3n/s3a are not (they are object-based).

s3n 和 s3a 的区别在于,s3n 支持的对象最大为 5GB,而 s3a 支持的对象最大为 5TB,并且性能更高(都是因为使用了分段上传).s3a 是 s3n 的继承者.

The difference between s3n and s3a is that s3n supports objects up to 5GB in size, while s3a supports objects up to 5TB and has higher performance (both are because it uses multi-part upload). s3a is the successor to s3n.

如果您来到这里是因为想了解应该将哪个 S3 文件系统用于 Amazon EMR,请阅读 这篇文章 来自亚马逊(仅在回程机器上可用).网络是:使用 s3://,因为 s3://和 s3n://在 EMR 的上下文中在功能上是可以互换的,而 s3a://与 EMR 不兼容.

If you're here because you want to understand which S3 file system you should use with Amazon EMR, then read this article from Amazon (only available on wayback machine). The net is: use s3:// because s3:// and s3n:// are functionally interchangeable in the context of EMR, while s3a:// is not compatible with EMR.

有关其他建议,请阅读使用存储和文件系统.

For additional advice, read Work with Storage and File Systems.

这篇关于从技术上讲,s3n、s3a 和 s3 之间有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆