为什么Cloudfront会在几个小时内从缓存中驱逐对象? [英] Why is Cloudfront evicting objects from cache within mere hours?

查看:481
本文介绍了为什么Cloudfront会在几个小时内从缓存中驱逐对象?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Cloudfront配置为缓存来自我们应用的图片。我发现这些图像很快就从缓存中逐出。由于图像是动态生成的,因此对于我们的服务器来说这是非常激烈的。为了解决这个问题,我设置了一个测试用例。

Cloudfront is configured to cache the images from our app. I found that the images were evicted from the cache really quickly. Since the images are generated dynamically on the fly, this is pretty intense for our server. In order to solve the issue I set up a testcase.

图片由我们提供原始服务器具有正确的 Last-Modified Expires 标题。

The image is served from our origin server with correct Last-Modified and Expires headers.

由于该网站是HTTPS只有我将查看器协议策略设置为 HTTPS 转发标头设置为对象缓存设置为使用原始缓存标头

Since the site is HTTPS only I set the Viewer Protocol Policy to HTTPS. Forward Headers is set to None and Object Caching to Use Origin Cache Headers.

我在11:25:11申请了一张图片。这返回了以下状态和标题:

I requested an image at 11:25:11. This returned the following status and headers:


  • 代码:200(确定)

  • 缓存:否

  • Code: 200 (OK)
  • Cached: No

到期日:2016年9月29日星期四09:24:31 GMT

Expires: Thu, 29 Sep 2016 09:24:31 GMT

A稍后重新加载(11:25:43)返回图像:

A reload a little while later (11:25:43) returned the image with:


  • 代码:304(未修改)

  • 缓存:是

  • Code: 304 (Not Modified)
  • Cached: Yes

到期日:2016年9月29日星期四09:24:31 GMT

Expires: Thu, 29 Sep 2016 09:24:31 GMT

近三个几小时后(14:16:11)我去了同一页面,图片加载:

Nearly three hours later (at 14:16:11) I went to the same page and the image loaded with:


  • 代码:200(OK)

  • 缓存:是

  • Code: 200 (OK)
  • Cached: Yes

到期日:2016年9月29日星期四09:24:31 GMT

Expires: Thu, 29 Sep 2016 09:24:31 GMT

自图像它仍然被快速加载的浏览器缓存。但我无法理解Cloudfront如何无法返回缓存的图像。因此,应用程序必须再次生成图像。

Since the image was still cached by the browser it loaded quickly. But I cannot understand how the Cloudfront could not return the cached image. Therefor the app had to generate the image again.

我读到Cloudfront在闲置几天后从其缓存中驱逐文件。如上所述,情况并非如此。怎么会这样?

I read that Cloudfront evicts files from its cache after a few days of being inactive. This is not the case as demonstrated above. How could this be?

推荐答案


我读过几天后Cloudfront会从其缓存中驱逐文件不活跃。

I read that Cloudfront evicts files from its cache after a few days of being inactive.

你有官方消息来源吗?

以下是官方答案:


如果不经常请求边缘位置的对象,CloudFront可能会驱逐对象 - 删除对象在到期日之前 - 为最近请求的对象腾出空间。

If an object in an edge location isn't frequently requested, CloudFront might evict the object—remove the object before its expiration date—to make room for objects that have been requested more recently.

http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Expiration.html

缓存对象没有保证的保留时间,需求较低的对象更容易被驱逐......但这不是你可能没有考虑过的唯一因素。驱逐可能不是问题,也可能是唯一的问题。

There is no guaranteed retention time for cached objects, and objects with low demand are more likely to be evicted... but that isn't the only factor you may not have considered. Eviction may not be the issue, or the only issue.

CloudFront缓存的对象就像Schrödinger的猫一样。这是一个松散的类比,但我正在使用它:在任何给定时刻对象是否在云端缓存中不是一个是或否的问题。

Objects cached by CloudFront are like Schrödinger's cat. It's a loose analogy, but I'm running with it: whether an object is "in the cloudfront cache" at any given instant is not a yes-or-no question.

CloudFront大约有53个边缘位置(您的浏览器连接并且内容是物理存储的) )在37个城市。一些主要城市有2或3.每个点击云端的请求都会被路由(通过DNS)到理论上最理想的位置 - 为简单起见,我们称之为最接近的边缘。

CloudFront has somewhere around 53 edge locations (where your browser connects and the content is physically stored) in 37 cities. Some major cities have 2 or 3. Each request that hits cloudfront is routed (via DNS) to the most theoretically optimal location -- for simplicity, we'll call it the "closest" edge to where you are.

Cloudfront的内部工作方式不是公开信息,而是基于观察结果的一般共识和大概是权威来源是这些边缘位置都是独立的。他们不共享缓存。

The internal workings of Cloudfront are not public information, but the general consensus based on observations and presumably authoritative sources is that these edge locations are all independent. They don't share caches.

例如,如果您在德克萨斯州(美国),并且您的请求已经通过并缓存在达拉斯/德克萨斯州沃思堡,如果赔率相等,你任何一个请求可以击中达拉斯边缘的任何一个位置,那么直到你得到同一个对象的两个未命中,你的下一个请求的赔率大约是50/50将是一个小姐。如果我从我的位置请求相同的对象,我从经验中知道往往通过South Bend,IN,那么我的第一次请求被错过的几率是100%,即使它被缓存在达拉斯。

If, for example, your are in Texas (US) and your request routed through and was cached in Dallas/Fort Worth, TX, and if the odds are equal that you any request from you could hit either of the Dallas edge locations, then until you get two misses of the same object, the odds are about 50/50 that your next request will be a miss. If I request that same object from my location, which I know from experience tends to route through South Bend, IN, then the odds of my first request being a miss are 100%, even though it's cached in Dallas.

因此,对象不在缓存中,也不在缓存中,因为没有该(单个,全局)缓存。

CloudFront对浏览器最近边缘的确定也可能会随着时间的推移而发生变化。

CloudFront用于确定最近边缘的机制似乎是动态和自适应的。互联网拓扑结构的变化可以改变边缘位置倾向于接收从给定IP地址发送的请求的移位,因此在几个小时的过程中,您连接的边缘可能会发生变化。影响特定边缘的维护或中断或其他问题也可能导致来自给定源IP地址的请求被发送到与典型边缘不同的边缘,这也可能让您感觉对象被驱逐,因为新边缘的缓存将与旧版本不同。

CloudFront's mechanism for determining the closest edge appears to be dynamic and adaptive. Changes in the topology of the Internet at large can change shift which edge location will tend to receive requests sent from a given IP address, so it is possible that over the course of a few hours, that the edge you are connecting to will change. Maintenance or outages or other issues impacting a particular edge could also cause requests from a given source IP address to be sent to a different edge than the typical one, and this could also give you the impression of objects being evicted, since the new edge's cache would be different from the old.

查看响应标头,无法确定处理每个请求的边缘位置。但是, CloudFront访问中提供了 这些信息。日志

Looking at the response headers, it isn't possible to determine which edge location handled each request. However, this information is provided in the CloudFront access logs.

我有一个fetch-and-resize图像服务,每天处理大约750,000张图像。它落后于CloudFront,我的命中/失败率约为50/50。这肯定不是所有CloudFront的错,因为我的图像池超过800万,观众遍布全世界,我的 max-age 指令比你的短。自从我上次分析日志以确定哪些以及未命中似乎出乎意料时已经有一段时间了(虽然当我这样做时,确实有一些,但它们的数量并非不合理),但这样做很容易,因为日志告诉你每个响应是一个命中还是一个未命中,以及识别边缘位置...所以你可以分析它,看看这里是否真的有一个模式。

I have a fetch-and-resize image service that handles around 750,000 images per day. It's behind CloudFront, and my hit/miss ratio is about 50/50. That is certainly not all CloudFront's fault, since my pool of images exceeds 8 million, the viewers are all over the world, and my max-age directive is shorter than yours. It has been quite some time since I last analyzed the logs to determine which and how "misses" seemed unexpected (though when I did, there definitely were some, but their number was not unreasonable), but that is done easily enough, since the logs tell you whether each response was a hit or a miss, as well as identifying the edge location... so you could analyze that to see if there's really a pattern here.

我的服务将所有输出内容存储在S3中,当新请求进入时,它首先向S3存储桶发送快速请求,以查看是否有可以避免的工作。如果S3返回结果,则该结果将返回到CloudFront,而不是再次执行所有提取和调整大小的工作。请注意,由于CloudFront未命中的数量,我没有实现这种能力......我从一开始就设计了这个功能,在我甚至在CloudFront后面进行测试之前,因为 - 毕竟 - CloudFront是一个缓存,根据定义,缓存的内容非常不稳定和短暂。

My service stores all of its output content in S3, and when a new request comes in, it first sends a quick request to the S3 bucket to see if there is work that can be avoided. If a result is returned by S3, then that result is returned to CloudFront instead of doing all the fetching and resizing work, again. Mind you, I did not implement that capability because of the number of CloudFront misses... I designed that in from the beginning, before I ever even tested it behind CloudFront, because -- after all -- CloudFront is a cache, and the contents of a cache are pretty much volatile and ephemeral, by definition.

更新:我在上面说过,似乎无法识别边缘位置转发通过检查来自CloudFront的请求标头的特定请求...但是,通过检查传入请求的源IP地址,似乎可以在一定程度上准确。

Update: I stated above that it does not appear possible to identify the edge location forwarding a particular request by examining the request headers from CloudFront... however, it appears that it is possible with some degree of accuracy by examining the source IP address of the incoming request.

例如,通过CloudFront发送到我的一个原始服务器的测试请求从54.240.144.13到达,如果我从家里访问我的站点,或者当我从办公室到达站点时从205.251.252.153到达 - 这些位置只有几个相隔数英里,但在州界的两侧并使用两个不同的ISP。这些地址的反向DNS查找显示了这些主机名:

For example, a test request sent to one of my origin servers through CloudFront arrives from 54.240.144.13 if I hit my site from home, or 205.251.252.153 when I hit the site from my office -- the locations are only a few miles apart, but on opposite sides of a state boundary and using two different ISPs. A reverse DNS lookup of these addresses shows these hostnames:

server-54-240-144-13.iad12.r.cloudfront.net.
server-205-251-252-153.ind6.r.cloudfront.net.

CloudFront边缘位置以最近的主要机场命名,加上任意选择的号码。对于 iad12 ...IAD是华盛顿特区杜勒斯机场的国际航空运输协会(IATA)代码,因此这可能是其中一个边缘位置阿什伯恩,弗吉尼亚州(有三个,可能最后有不同的数字代码,但我无法从这个数据确认)。对于 ind6 ,IND与印第安纳州印第安纳波利斯的机场相匹配,因此强烈建议此请求来自南边弯,IN,边缘位置。此测试的可靠性取决于CloudFront维护其反向DNS条目的一致性。没有记录在任何给定边缘位置可能有多少独立缓存;假设只有一个,但可能不止一个,具有增加非常少量请求的未命中率的效果,但消失在大量请求的混合中。

CloudFront edge locations are named after the nearest major airport, plus an arbitrarily chosen number. For iad12 ... "IAD" is the International Air Transport Association (IATA) code for Washington, DC Dulles airport, so this is likely to be one of the edge locations in Ashburn, VA (which has three, presumably with different numerical codes at the end, but I can't confirm that from just this data). For ind6, "IND" matches the airport at Indianapolis, Indiana, so this strongly suggests that this request comes through the South Bend, IN, edge location. The reliability of this test would depend on the consistency with which CloudFront maintains its reverse DNS entries. It is not documented how many independent caches might be at any given edge location; the assumption is that there's only one, but there might be more than one, having the effect of increasing the miss ratio for very small numbers of requests, but disappearing into the mix for large numbers of requests.

这篇关于为什么Cloudfront会在几个小时内从缓存中驱逐对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆