如何开始使用 Web 缓存、CDN 和代理服务器? [英] How to get started with web caching, CDNs, and proxy servers?

查看:18
本文介绍了如何开始使用 Web 缓存、CDN 和代理服务器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是新手程序员,正在建立一家初创公司,我(自然地)希望它能创造大量流量.我在 Amazon EC2 上的 dotcloud 上托管我的 django 项目.我有一些流媒体(虽然是 Http,而不是 rmtp),所以 dotcloud 人员建议我使用 CDN.我也在使用 Amazon S3 进行存储,因此决定使用 Amazon CloudFront 作为我的 CDN.

现在是我需要将注意力转向缓存的时候了,但我感到迷茫和困惑.我对这个概念完全陌生.我的全部知识都来自我刚刚阅读的教程(http://www.mnot.net/cache_docs/) 和一个令人困惑的周末花在咨询谷歌上.最令人不安的是,我什至不确定我需要为我的网站做什么.

  1. CDN 和代理服务器有什么区别?

  2. 我是否可能想要使用缓存服务(例如 memcached、redis)、CDN (CloudFront) 和代理服务器 (squid)?

  3. 我们的网站是由数据库驱动的,并生成特定于用户位置的动态生成列表.这样的网站可以缓存吗?(列表本身可以通过 AJAX 进行过滤,因此 URL 可能保持不变,但产生的结果却大不相同.例如,example.com/some_url/可能会生成一个包含 40 个对象的列表,但只有 10 个出现在页面上.通过单击一个过滤器,用户最终可以得到 10 个不同的对象,而仍然在/some_url/)

  4. 高流量、内容丰富的网站的最佳做法是什么?

  5. 我怎样才能了解这个?我所看到的任何地方似乎都认为一些基础知识是理所当然的,但我还没有将其作为自己基础的一部分.

我不确定我问的问题是否正确.只是觉得很失落.我现在已经构建了整个网站的 95%,并认为我只是在解决细节问题,但缓存似乎是另一项重大任务.任何指导/建议/鼓励将不胜感激!

解决方案

那么让我们从缓存开始...

缓存是关于临时存储一些东西,这样您就不必每次都执行更昂贵的操作来检索它.

HTTP 缓存是关于保存到服务器的往返行程,如果您只使用默认行为,浏览器会要求服务器如果您有更新的版本,请向我发送此资源的副本"

如果您将 expires 标头设置为未来时间,则浏览器不会询问此问题,因为它知道它可以使用它所获得的资源副本.

此级别的缓存可改善最终用户体验并节省带宽.

根据您的简要描述,HTTP 缓存可以帮助处理较小的静态文件(请阅读 bookofspeed.com 的 ch3)

数据库缓存作为 memcached(和 redis)用于减少数据库的负载(例如)通过保存操作的结果然后从缓存中提供它们而不是重复数据库操作)

在您的情况下,您将根据请求参数在数据检索层进行缓存(并可能确保不缓存对客户端的 HTTP 响应).

CDN 与代理服务器...

这些确实是不同的野兽 - CDN 旨在让内容靠近访问者,从而减少延迟 - 如果您提供大文件,它也会将它们放在为其优化的网络上,而不是您的服务器上,但价格要£££依附于这样做.一些 CDN,例如云前端有一个类似代理的行为,如果他们没有访问者想要的文件,他们会返回到您的原始服务器.

代理服务器实际上是位于您的服务器和最终访问者之间的服务器 - 它们可能是您的服务器群(反向代理)、ISP 网络或访问者网络的一部分.

反向代理本质上是从您的服务器上卸载与最终访问者的通信工作,例如如果他们的连接速度很慢,他们会占用一个服务器来生成一个页面更长的时间.反向代理也可以位于多个服务器的前面——要么都做同样的事情,要么做不同的事情,代理向外界提供一个地址.Squid 是您可能会使用的一种代理,但 Varnish 也是非常受欢迎的 ATM.

普通代理只是为那些通过它们来的访问者充当缓存,例如公司可能在其 Internet 网关处有一个缓存代理服务器,以便第一个访问外部站点的人可以检索文件,随后的访问者从代理中获取文件 - 他们获得了更快的体验,并且公司减少了带宽消耗.

我猜您目前没有高流量网站,因此您面临的挑战是了解将精力花在哪里,即什么时候需要优化.

我的第一个建议是加入一些真正的用户监控 (RUM),即使它是使用 Boomerang.js 或 Pion 构建您自己的.还可以查看 Cacti/Munin/CollectD 等监控工具,以便了解服务器上的负载.

了解您的用户体验是确定需要优化的关键所在.

I'm newbie programmer building a startup that I (naturally) hope will create a large amount of traffic. I am hosting my django project on dotcloud, which is on Amazon EC2. I have some streaming media (Http though, not rmtp) so the dotcloud guys recommended I go with a CDN. I am also using Amazon S3 for storage and so decided to go with Amazon CloudFront as my CDN.

The time has come where I need to turn my attention to caching and I am lost and confused. I am completely new to the concept. The entire extent of my knowledge comes from a tutorial I just read (http://www.mnot.net/cache_docs/) and a confusing weekend spent consulting google. Most troubling of all is that I am not even sure what I need to do for my site.

  1. What is the difference between a CDN and a proxy server?

  2. Is it possible I might want to use a caching service (e.g. memcached, redis), a CDN (CloudFront), AND a proxy server (squid)?

  3. Our site is DB driven and produces dynamically generated lists specific to user locations. Can such a site be cached? (The lists themselves are filterable via AJAX, so the URL might remain the same while producing largely different results. For instance, example.com/some_url/ might generate a list of 40 objects, but only 10 appearing on the page. By clicking on a filter, the user could end up with 10 different objects while still at /some_url/)

  4. What are the best practices for a high traffic, rich content site?

  5. How can I learn about this? Everywhere I look seems to take for granted some basics that I just don't have as a part of my own foundation yet.

I'm not certain I'm asking the right questions. Just feeling very lost. I've now built 95% of my entire site and thought I was just ironing out the details but caching seems like another major undertaking. Any guidance/advice/encouragement would be much appreciated!

解决方案

Right then let's start with caching...

Caching is about storing something on a temporary basis so that you don't have to perform a more expensive operation to retrieve it every time.

HTTP caching is about saving round-trips to servers, if you just use default behaviour a browser will ask the server to "send me a copy of this resource if you have a more recent version"

If you set expires header to a future time, then the browser doesn't ask this question as it knows it can use the copy of the resource it's got.

Caching at this level improves the end-users experience and saves you bandwidth.

From your brief description HTTP caching could help with the smaller static files (have a read of ch3 of bookofspeed.com)

DB caching as memcached (and redis) are used for are about reducing the load on databases (for example) by saving the results on an operation and then serving them from the cache rather than repeating the database operation)

In your situation you would cache at the data retrieval layer based on the request parameters (and perhaps ensure the HTTP responses to the client aren't cached).

CDNs vs Proxy Servers...

These are really different beasts - CDNs are about keeping content close to your visitors so reducing latency - if you're serving large files it also puts them on a network optimised for it instead of your servers but there's a £££ price attached to doing that. Some CDNs e.g. cloud front have a proxy like behaviour where they go back to your origin server if they don't have the file the visitor wants.

Proxy servers are literally servers that sit between your server and the end visitor - they might be part of your server farm (reverse proxy) the ISP's network or the visitor's network.

A reverse proxy is essentially offloading the work of communication with the end-visitor from your servers e.g. if they have a slow connection they'll tie up a server generating a page for longer. Reverse proxies can also sit infront of multiple servers - either all doing the same thing or different things and the proxy presents a single address to the outside world. Squid is one proxy you might use but Varnish is very popular ATM too.

Normal proxies just act as caches for those visitors who come through them e.g. a company may have a caching proxy server at their internet gateway so that the first person visiting an external site gets to retrieve a file and subsequent visitors get it form the proxy - they get a faster experience and the company reduces their bandwidth consumption.

I'm guessing you don't have a high traffic site at the moment so your challenge is to understand where to spend your effort i.e. what needs optimising when.

My first recommendation would be to get some real user monitoring (RUM) in, even if it's building your own using Boomerang.js or Pion. Also look at monitoring tools such as Cacti/Munin/CollectD so you can understand the load on your servers.

Understanding your users experience is key to working out where you need to optimise.

这篇关于如何开始使用 Web 缓存、CDN 和代理服务器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆