S3 存储桶全局唯一性 [英] S3 bucket global uniqueness

查看:21
本文介绍了S3 存储桶全局唯一性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在试图解释为什么 S3 存储桶名称必须是全局唯一的.我也遇到了 stackoverflow 的答案,上面说为了解析主机头,存储桶名称必须是唯一的.但是,我的观点是 AWS 不能将 s3-region.amazonaws.com 定向到可以为该区域的存储桶对象提供服务的特定于区域的 Web 服务器吗?这样,名称就可以仅在一个区域中是全球唯一的.这意味着,可以在不同的区域中创建相同的存储桶.请让我知道我对名称解析的工作方式或其他方式的理解是否完全错误?

I have been trying to reason why an S3 bucket name has to be globally unique. I came across the stackoverflow answer as well that says in order to resolve host header, bucket name got to be unique. However, my point is can't AWS direct the s3-region.amazonaws.com to region specific web server that can serve the bucket object from that region? That way the name could be globally unique only for a region. Meaning, the same bucket could be created in a different region. Please let me know if my understanding is completely wrong on how name resolution works or otherwise?

推荐答案

严格来说,bucket 命名空间必须是全局的,并没有技术上的原因.事实上,它在技术上并不完全像大多数人想象的那样全局,因为 S3 有三个不同的分区,它们彼此完全隔离,并且跨分区边界不共享相同的全局存储区命名空间 -- 分区是 aws(大多数人称为AWS"的全球区域集合)、aws-us-gov(美国 GovCloud)和 aws-cn(北京和宁夏偏远地区).

There is not, strictly speaking, a technical reason why the bucket namespace absolutely had to be global. In fact, it technically isn't quite as global as most people might assume, because S3 has three distinct partitions that are completely isolated from each other and do not share the same global bucket namespace across partition boundaries -- the partitions are aws (the global collection of regions most people know as "AWS"), aws-us-gov (US GovCloud), and aws-cn (the Beijing and Ningxia isolated regions).

所以事情可以设计成不同的,每个区域都是独立的,但现在这无关紧要,因为全局命名空间是根深蒂固的.

So things could have been designed differently, with each region independent, but that is irrelevant now, because the global namespace is entrenched.

为什么?

全局命名空间的具体原因没有公开说明,但几乎可以肯定与服务的发展、向后兼容性和新区域的采用的容易性有关.

The specific reasons for the global namespace aren't publicly stated, but almost certainly have to do with the evolution of the service, backwards compatibility, and ease of adoption of new regions.

S3 是最古老的 AWS 服务之一,甚至比 EC2 还要古老.他们几乎肯定没有预见到它会变得多大.

S3 is one of the oldest of the AWS services, older than even EC2. They almost certainly did not foresee how large it would become.

最初,命名空间必须是全局的,因为没有多个区域.S3 有一个逻辑区域(长期以来称为美国标准"),实际上它至少由两个物理区域组成,位于 us-east-1 和 us-west-2 中或附近.您不知道或不关心每次上传到哪个物理区域,因为它们来回复制、透明且基于延迟的 DNS 解析会自动为您提供延迟最低的端点.许多用户从来不知道这个细节.

Originally, the namespace was global of necessity because there weren't multiple regions. S3 had a single logical region (called "US Standard" for a long time) that was in fact comprised of at least two physical regions, in or near us-east-1 and us-west-2. You didn't know or care which physical region each upload went to, because they replicated back and forth, transparently, and latency-based DNS resolution automatically gave you the endpoint with the lowest latency. Many users never knew this detail.

您甚至可以使用 s3-external-1.amazonaws.com 端点明确覆盖 DNS amd 上传到东部的自动地理路由,或者使用 s3- 覆盖到西部external-2.amazonaws.com 端点,但您的对象很快就可以从任一端点访问.

You could even explicitly override the automatic geo-routing of DNS amd upload to the east using the s3-external-1.amazonaws.com endpoint or to the west using the s3-external-2.amazonaws.com endpoint, but your object would shortly be accessible from either endpoint.

到目前为止,S3 还没有为新对象提供立即的先写后读一致性,因为这在早期存在的主/主循环复制环境中是不切实际的.

Up until this point, S3 did not offer immediate read-after-write consistency on new objects since that would be impractical in the primary/primary, circular replication environment that existed in earlier days.

最终,S3 在其他 AWS 区域上线时推出,但他们将其设计为任何区域的存储桶都可以作为 ${bucket}.s3.amazonaws.com 访问.这使用 DNS 根据主机名中的存储桶名称将请求路由到正确的区域,并且 S3 维护 DNS 映射.*.s3.amazonaws.com 曾经(现在仍然是)通配符记录,将所有内容都指向S3 美国标准",但 S3 会为您的存储桶创建一个 CNAME,覆盖通配符并指向正确的区域,自动,在存储桶创建后几分钟.在此之前,S3 将返回一个临时的 HTTP 重定向.显然,这需要一个全局存储桶命名空间.它仍然适用于除最新区域之外的所有区域.

Eventually, S3 launched in other AWS regions as they came online, but they designed it so that a bucket in any region could be accessed as ${bucket}.s3.amazonaws.com. This used DNS to route the request to the correct region, based on the bucket name in the hostname, and S3 maintained the DNS mappings. *.s3.amazonaws.com was (and still is) a wildcard record that pointed everything to "S3 US Standard" but S3 would create a CNAME for your bucket that overrode the wildcard and pointed to the correct region, automatically, a few minutes after bucket creation. Until then, S3 would return a temporary HTTP redirect. This, obviously enough, requires a global bucket namespace. It still works for all but the newest regions.

但是他们为什么要这样做呢?毕竟,大约在同一时间,S3 还引入了样式 ${bucket}.s3-${region}.amazonaws.com ¹ 的端点.实际上是通配符 DNS 记录:*.s3-${region}.amazonaws.com 直接路由到每个 S3 区域的区域 S3 端点,并且是响应式(但不可用)端点,即使对于不存在的桶.如果您在 us-east-2 中创建一个存储桶并将对该存储桶的请求发送到 eu-west-1 端点,则 eu-west-1 中的 S3 将抛出错误,告诉您需要将请求发送给我们-east-2.

But why did they do it that way? After all, at around the same time S3 also introduced endpoints in the style ${bucket}.s3-${region}.amazonaws.com ¹ that are actually wildcard DNS records: *.s3-${region}.amazonaws.com routes directly to the regional S3 endpoint for each S3 region, and is a responsive (but unusable) endpoint, even for nonexistent buckets. If you create a bucket in us-east-2 and send a request for that bucket to the eu-west-1 endpoint, S3 in eu-west-1 will throw an error, telling you that you need to send the request to us-east-2.

此外,大约在这个时候,他们悄悄地放弃了整个东西方复制的东西,后来将美国标准重命名为当时的实际情况——us-east-1.(支持向后兼容性"论点,s3-external-1 和 s3-external-2 仍然是有效的端点,但它们都指向完全相同的地方,在 us-east-1.)

Also, around this time, they quietly dropped the whole east/west replication thing, and later renamed US Standard to what it really was at that point -- us-east-1. (Buttressing the "backwards compatibility" argument, s3-external-1 and s3-external-2 are still valid endpoints, but they both point to precisely the same place, in us-east-1.)

那么为什么bucket命名空间仍然是全局的?局外人能给出的唯一真正正确的答案是因为那是决定要做的事情".

So why did the bucket namespace remain global? The only truly correct answer an outsider can give is "because that's what the decided to do."

但也许一个因素是 AWS 希望保持与使用 ${bucket}.s3.amazonaws.com 的现有软件的兼容性,以便客户无需更改代码即可在其他地区部署存储桶.在签名版本 2(及更早版本)的旧时代,签署请求的代码不需要知道 API 端点区域.签名版本 4 需要端点区域的知识才能生成有效的签名,因为签名密钥是根据日期、区域和服务派生的......名称和客户端代码不需要区域意识——甚至不需要意识到 S3 甚至有区域——才能在任何区域使用存储桶.

But perhaps one factor was that AWS wanted to preserve compatibility with existing software that used ${bucket}.s3.amazonaws.com so that customers could deploy buckets in other regions without code changes. In the old days of Signature Version 2 (and earlier), the code that signed requests did not need to know the API endpoint region. Signature Version 4 requires knowledge of the endpoint region in order to generate a valid signature because the signing key is derived against the date, region, and service... but previously it wasn't like that, so you could just drop in a bucket name and client code needed no regional awareness -- or even awareness that S3 even had regions -- in order to work with a bucket in any region.

AWS 以其保持向后兼容性的做法而闻名.他们如此一贯地这样做,以至于偶尔会出现一些令人尴尬的设计错误并保持未修复,因为修复它们会破坏正在运行的代码.²

AWS is well-known for its practice of preserving backwards compatibility. They do this so consistently that occasionally some embarrassing design errors creep in and remain unfixed because to fix them would break running code.²

另一个问题是存储桶的虚拟托管.在 HTTPS 被接受为非可选之前,通过将 CNAME 指向 S3 端点来托管 ststic 内容是很常见的.如果您将 www.example.com 指向 S3,它将提供来自具有确切名称 www.example.com 的存储桶中的内容.您仍然可以这样做,但它不再有用,因为它不支持 HTTPS.要使用 HTTPS 托管静态 S3 内容,您可以在存储桶前使用 CloudFront.由于 CloudFront 重写了 Host 标头,因此存储桶名称可以是任何名称.您可能会问为什么不能只将 www.example.com CNAME 指向存储桶的端点主机名,但是 HTTP 和 DNS 在非常不同的层上运行,而且它根本不起作用道路.(如果您怀疑此断言,请尝试将您控制的域中的 CNAME 指向 www.google.com.您不会发现您的域为 Google 主页提供服务;相反,您会收到错误消息,因为 Google服务器只会看到它收到了对 www.example.com 的请求,而不会注意到有一个中间 CNAME 指向它的事实.)存储桶的虚拟托管需要要么全局存储桶命名空间(因此 Host 标头与存储桶完全匹配)或完全独立的主机名到存储桶名称的映射数据库......当您已经建立了存储桶的全局命名空间时,为什么还要这样做?

Another issue is virtual hosting of buckets. Back before HTTPS was accepted as non-optional, it was common to host ststic content by pointing your CNAME to the S3 endpoint. If you pointed www.example.com to S3, it would serve the content from a bucket with the exact name www.example.com. You can still do this, but it isn't useful any more since it doesn't support HTTPS. To host static S3 content with HTTPS, you use CloudFront in front of the bucket. Since CloudFront rewrites the Host header, the bucket name can be anything. You might be asking why you couldn't just point the www.example.com CNAME to the endpoint hostname of your bucket, but HTTP and DNS operate at very different layers and it simply doesn't work that way. (If you doubt this assertion, try pointing a CNAME from a domain that you control to www.google.com. You will not find that your domain serves the Google home page; instead, you'll be greeted with an error because the Google server will only see that it's received a request for www.example.com, and be oblivious to the fact that there was an intermediate CNAME pointing to it.) Virtual hosting of buckets requires either a global bucket namespace (so the Host header exactly matches the bucket) or an entirely separate mapping database of hostnames to bucket names... and why do that when you already have an established global namespace of buckets?

¹请注意,这些端点中 s3 之后的 - 最终被更合乎逻辑的 取代. 但这些旧端点仍然有效.

¹ Note that the - after s3 in these endpoints was eventually replaced by a much more logical . but these old endpoints still work.

²想到的两个例子:(1)当非 CORS 请求到达启用 CORS 的存储桶时,S3 错误地省略了 Vary: Origin 响应标头(我曾争辩说这可以是在不破坏任何东西的情况下修复,无济于事);(2) S3 在 API 上对对象键中符号 + 的处理明显错误,其中服务将 + 解释为 %20(space) 所以如果你想让浏览器从一个链接下载到 /foo+bar 你必须把它上传为 /foo{space}bar.

² two examples that come to mind: (1) S3's incorrect omission of the Vary: Origin response header when a non-CORS request arrives at a CORS-enabled bucket (I have argued without success that this can be fixed without breaking anything, to no avail); (2) S3's blatantly incorrect handling of the symbol + in an object key, on the API, where the service interprets + as meaning %20 (space) so if you want a browser to download from a link to /foo+bar you have to upload it as /foo{space}bar.

这篇关于S3 存储桶全局唯一性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆