从技术上来说,s3n,s3a和s3有什么区别? [英] Technically what is the difference between s3n, s3a and s3?

查看:1415
本文介绍了从技术上来说,s3n,s3a和s3有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道存在 https://wiki.apache.org/hadoop/AmazonS3 和以下语句:


S3本地文件系统(URI scheme:s3n)用于在S3上读取和写入常规文件的本机文件系统。这个文件系统的优点是你可以访问使用其他工具编写的S3上的文件。相反,其他工具可以访问使用Hadoop编写的文件。缺点是S3对文件大小限制了5GB。



S3A(URI scheme:s3a)S3 Native:s3n fs的后继者,S3a:系统使用亚马逊的图书馆与S3交互。这允许S3a支持更大的文件(不超过5GB限制),更高的性能操作等等。文件系统的目的是成为S3 Native的替代品/继承者:所有可从s3n:// URL访问的对象也应该可以通过替换URL模式从s3a访问。



< S3 Block FileSystem(URI scheme:s3)S3支持的基于块的文件系统。文件存储为块,就像在HDFS中一样。这允许有效实现重命名。这个文件系统要求你为文件系统专用一个存储桶 - 你不应该使用包含文件的现有存储桶,或者将其他文件写入同一个存储桶。这个文件系统存储的文件可能大于5GB,但它们不能与其他S3工具互操作。


该URI可以做出这样的区别?例如

  val data = sc.textFile(s3n:// bucket-name / key)

  val data = sc.textFile(s3a:// bucket-name / key)

什么是这种变化背后的技术差异?有什么好的文章可以阅读吗?

解决方案

URI方案中的字母更改有很大的不同,因为它导致使用不同的软件来连接到S3。有点像http和https之间的区别 - 它只是一个字母的变化,但它触发了行为上的巨大差异。



s3和s3n / s3a之间的区别是s3是基于块的覆盖在Amazon S3之上,而s3n / s3a不是(它们是基于对象的)。



s3n和s3a的区别在于s3n支持高达5GB大小的对象,而s3a支持高达5TB的对象并具有更高的性能(都是因为它使用多部分上传)。 s3a是s3n的继承者。

如果您因为想了解您应该在Amazon EMR中使用哪个S3文件系统而来到这里,请阅读本文来自亚马逊(网址是:use s3:// because s3://和s3n://在EMR的上下文中可以互换,而s3a://与EMR不兼容)。


I'm aware of the existence of https://wiki.apache.org/hadoop/AmazonS3 and the following words:

S3 Native FileSystem (URI scheme: s3n) A native filesystem for reading and writing regular files on S3. The advantage of this filesystem is that you can access files on S3 that were written with other tools. Conversely, other tools can access files written using Hadoop. The disadvantage is the 5GB limit on file size imposed by S3.

S3A (URI scheme: s3a) A successor to the S3 Native, s3n fs, the S3a: system uses Amazon's libraries to interact with S3. This allows S3a to support larger files (no more 5GB limit), higher performance operations and more. The filesystem is intended to be a replacement for/successor to S3 Native: all objects accessible from s3n:// URLs should also be accessible from s3a simply by replacing the URL schema.

S3 Block FileSystem (URI scheme: s3) A block-based filesystem backed by S3. Files are stored as blocks, just like they are in HDFS. This permits efficient implementation of renames. This filesystem requires you to dedicate a bucket for the filesystem - you should not use an existing bucket containing files, or write other files to the same bucket. The files stored by this filesystem can be larger than 5GB, but they are not interoperable with other S3 tools.

Why a letter change on the URI could make such difference? For example

val data = sc.textFile("s3n://bucket-name/key")

to

val data = sc.textFile("s3a://bucket-name/key")

What is the technical difference underlying this change? Are there any good articles that I can read on this?

解决方案

The letter change on the URI scheme makes a big difference because it causes different software to be used to interface to S3. Somewhat like the difference between http and https - it's only a one-letter change, but it triggers a big difference in behavior.

The difference between s3 and s3n/s3a is that s3 is a block-based overlay on top of Amazon S3, while s3n/s3a are not (they are object-based).

The difference between s3n and s3a is that s3n supports objects up to 5GB in size, while s3a supports objects up to 5TB and has higher performance (both are because it uses multi-part upload). s3a is the successor to s3n.

If you're here because you want to understand which S3 file system you should use with Amazon EMR, then read this article from Amazon (the net is: use s3:// because s3:// and s3n:// are functionally interchangeable in the context of EMR, while s3a:// is not compatible with EMR).

这篇关于从技术上来说,s3n,s3a和s3有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆