使用Amazon S3和的Cloudfront的智能缓存webapges [英] Use Amazon S3 and Cloudfront for intelligently caching webapges

查看:171
本文介绍了使用Amazon S3和的Cloudfront的智能缓存webapges的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个网站(在Tomcat的弹性魔豆上运行)产生的艺术家唱片分类目录(单页一位艺术家)。这可以是资源密集型的,所以作为艺术家页面不超过一个月的时间我把CloudFront的分布在它前面的改变。

我认为这将意味着没有过送达不止一次被我的服务器但是它不太好作为艺术家的要求。这篇文章解释说,每一个边缘位置(欧洲,美国等),会得到一个怀念他们第一次查找资源,并且有一个限制多少资源被保存在缓存中CloudFront的,因此他们可能被丢弃。

因此​​,为了对付这一点,我已经通过服务器code变更为存储在S3水桶网页的副本,并检查此先当请求进来,所以如果艺术家页面已经存在于S3则服务器检索它,并返回它的内容作为网页。这极大地减少了处理,因为它仅构建的网页的特定艺术家一次。

不过:<​​/ P>

  1. 的要求还是要过到服务器,以检查艺术家页面存在。
  2. 如果艺术家页面存在,那么所述网页(和它们有时可以是大的向上到20MB)首先下载到服务器,然后服务器返回的页面。

所以,我想知道如果我能改善这一点 - 我知道你可以构建一个S3桶作为重定向到另一个网站。是否有一个单页的方式我能得到艺术家要求,到S3的桶,然后把它返回的页面是否存在,或拨打服务器,如果它不?

另外我能得到服务器检查页面是否存在,然后重定向到S3的页面,而不是页面下载到服务器第一?

解决方案

OP说:

  

他们有时会大起来,为20MB

由于数据卖不卖可以pretty的大的体积,我认为这是可行的为你做这2个请求,而不是一个,你从服务部分的内容分离的内容生成。这样做的原因是,以尽量减少它需要在服务器上,从S3获取数据和服务于它的时间/资源量。

AWS支持 $ P $对 - 签署网址其可以是有效的时间短量;我们可以尝试使用同样在这里,以避免在安全性等问题。

目前,你的架构看起来像以下,其中。客户端发起一个请求,你检查是否存在于S3所请求的数据,然后取出,如果有服务于它,否则你生成的内容,并保存到S3:

 如果存在于S3
客户端--------&GT;服务器--------------------&GT;从S3读取和服务
                    |
                    |其他
                    | ------&GT;生成内容-------&GT;保存到S3和服务
 

在网络资源方面,你总是消耗两倍的带宽和时间在这里的量。如果数据存在,那么,一旦你必须从服务器获取它,它服务于客户(所以它是2X)。如果数据不存在,将其发送给客户和S3(所以这又是2X)


相反,你可以试试下面的方法2,这两个假设你有一定的基础模板,而其它数据可以获取通过AJAX调用,而这两者都打倒了2X因素,在整体架构。

  1. 即成从S3只有内容。这需要更改您的产品设计方式,因此可能不那么容易积。

    基本上,对于每一个进入的请求,返回S3中的URL,如果该数据已经存在,否则为它创建的SQS一个任务,生成数据,并将其推到S3。根据您的使用模式不同的艺术家,你应该有多少时间才能在数据齐心协力平均,所以返回的将是有效的estimated_time_for_completetion( T的URL的估计)的任务。

    客户端等待时间 T ,然后让到URL之前返回的请求。它使高达3说的尝试失败的情况下获取这些数据。事实上,在S3中已有的数据可以被认为是基本情况,当 T = 0

    在这种情况下,您从客户端2-4的网络请求,但只有前这些请求涉及到你的服务器。你只在它不存在的情况下,一旦数据传送到S3和客户端总是从S3在拉

     如果存在于S3,返回URL
    客户端--------&GT;服务器--------------------------------&GT; S3
                        |
                        |其他SQS任务
                        | ---------------&GT;生成内容-------&GT;保存到S3
                         返回pre计算的URL
    
    
               等待的时间`T`
    客户端-------------------------&GT; S3
     


<醇开始=2>
  • 检查数据是否已经存在,并进行第二次网络调用相应。

    这是类似于从服务器上,以防它不存在,提供数据时,目前正在做的。同样,我们做2请求在这里,然而,这一次我们尝试从的情况下不存在的服务器同步提供数据服务。

    因此​​,在第一击,我们检查内容曾经被生成pviously $ P $,在这种情况下,我们得到一个成功的URL,或者错误消息。如果成功,下一个点击进入S3。

    如果数据没有在S3上存在,我们做一个新的请求(为不同的帖子的网址),在获取其中的服务器计算数据,提供它,同时加入一个异步任务,把它推到S3。

     如果存在于S3,返回URL
    客户端--------&GT;服务器--------------------------------&GT; S3
    
    客户端--------&GT;服务器---------&GT;生成内容-------&GT;服务于它
                                           |
                                           | ---&GT;添加SQS任务推到S3
     

  • I have a website (running within Tomcat on Elastic Beanstalk) that generates artist discographies (a single page for one artist). This can be resource intensive, so as the artist pages don't change over a month period I put a CloudFront Distribution in front of it.

    I thought this would mean no artist request ever had to be served more than once by my server however its not quite as good as that. This post explains that every edge location (Europe, US etc.) will get a miss the first time they look up the resource and that there is a limit to how many resources are kept in the cloudfront cache so they could be dropped.

    So to counter this I have changed by server code to store a copy of the webpage in a bucket within S3 AND to check this first when a request comes in, so if the artist page already exists in S3 then the server retrieves it and returns its contents as the webpage. This greatly reduces the processing as it only constructs a webpage for a particular artist once.

    However:

    1. The request still has to go to the server to check if the artist page exists.
    2. If the artist page exists then the webpage (and they can sometimes be large up-to 20mb) is first downloaded to the server and then server returns the page.

    So I wanted to know if I could improve this - I know you can construct an S3 bucket as a redirect to another website. Is there a per-page way I could get the artist request to go to the S3 bucket and then have it return the page if it exists or call server if it does not?

    Alternatively could I get the server to check if page exists and then redirect to the S3 page rather than download the page to the server first?

    解决方案

    OP says:

    they can sometimes be large up-to 20mb

    Since the volume of data you serve can be pretty large, I think it is feasible for you to do this in 2 requests instead of one, where you decouple the content generation from the content serving part. The reason to do this is so as to minimize the amount of time/resources it takes on the server to fetch data from S3 and serve it.

    AWS supports pre-signed URLs which can be valid for a short amount of time; We can try using the same here to avoid issues around security etc.

    Currently, your architecture looks something like below, wherein. the client initiates a request, you check if the requested data exists on the S3 and then fetch and serve it if there, else you generate the content, and save it to S3:

                               if exists on S3
    client --------> server --------------------> fetch from s3 and serve
                        |
                        |else
                        |------> generate content -------> save to S3 and serve
    

    In terms of network resources, you always consume 2X the amount of bandwidth and time here. If the data exists, then once you have to pull it from server and serve it to customer (so it is 2X). If the data doesn't exist, you send it to customer and to S3 (so again it is 2X)


    Instead, you can try 2 approaches below, both of which assume that you have some base template, and that the other data can be fetched via AJAX calls, and both of which bring down that 2X factor in the overall architecture.

    1. Serve the content from S3 only. This calls for changes to the way your product is designed, and hence may not be that easily integrable.

      Basically, for every incoming request, return the S3 URL for it if the data already exists, else create a task for it in SQS, generate the data and push it to S3. Based on your usage patterns for different artists, you should be having an estimate of how much time it takes to pull together the data on the average, and so return a URL which would be valid with the estimated_time_for_completetion(T) of the task.

      The client waits for time T, and then makes the request to the URL returned earlier. It makes upto say 3 attempts for fetching this data in case of failure. In fact, the data already existing on S3 can be thought of as the base case when T = 0.

      In this case, you make 2-4 network requests from the client, but only the first of those requests comes to your server. You transmit the data once to S3 only in the case it doesn't exists and the client always pulls in from S3.

                                 if exists on S3, return URL
      client --------> server --------------------------------> s3
                          |
                          |else SQS task
                          |---------------> generate content -------> save to S3 
                           return pre-computed url
      
      
                 wait for time `T`
      client  -------------------------> s3
      


    1. Check if data already exists, and make second network call accordingly.

      This is similar to what you currently do when serving data from the server in case it doesn't already exist. Again, we make 2 requests here, however, this time we try to serve data synchronously from the server in the case it doesn't exist.

      So, in the first hit, we check if the content had ever been generated previously, in which case, we get a successful URL, or error message. When successful, the next hit goes to S3.

      If the data doesn't exist on S3, we make a fresh request (to a different POST URL), on getting which, the server computes data, serves it, while adding an asynchronous task to push it to S3.

                                 if exists on S3, return URL
      client --------> server --------------------------------> s3
      
      client --------> server ---------> generate content -------> serve it
                                             |
                                             |---> add SQS task to push to S3
      

    这篇关于使用Amazon S3和的Cloudfront的智能缓存webapges的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆