Nginx 代理 Amazon S3 资源 [英] Nginx proxy Amazon S3 resources

查看:61
本文介绍了Nginx 代理 Amazon S3 资源的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在执行一些 WPO 任务,因此 PageSpeed 建议我利用浏览器缓存.我已经成功地改进了我的 Nginx 服务器中的一些静态文件,但是我存储在 Amazon S3 服务器中的图像文件仍然丢失.

我已经阅读了有关更新 S3 中的每个文件以包含一些标头元标记(Expires 和 Cache-Control)的方法.我认为这不是一个好方法.我有数千个文件,所以这对我来说不可行.

我认为最方便的方法是配置我的 Nginx 1.6.0 服务器来代理 S3 文件.我已经读过这方面的内容,但我对服务器配置一点也不熟练,所以我从这些网站上得到了几个例子:

4.使用 Nginx 代理和条件 GET 请求文件:

curl -I http://localhost:8080/s3/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg--header "If-None-Match: 37a907fc5dd7cfd0c428af78f09e95a9"HTTP/1.1 304 未修改服务器:nginx/1.12.0日期:2017 年 6 月 25 日星期日 18:32:16 GMT连接:保持连接上次修改时间:2017 年 6 月 21 日星期三 07:42:31 GMTETag:37a907fc5dd7cfd0c428af78f09e95a9"到期时间:2018 年 7 月 21 日星期五 07:41:49 UTC缓存控制:max-age=31536000

5.使用 Nginx 代理缓存请求文件,请查看 X-Cache-Status 标头,其值为 MISS,直到第一次请求后缓存预热

curl -I http://localhost:8080/s3_cached/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpgHTTP/1.1 200 正常服务器:nginx/1.12.0日期:2017 年 6 月 25 日星期日 18:40:45 GMT内容类型:二进制/八位字节流内容长度:378843连接:保持连接上次修改时间:2017 年 6 月 21 日星期三 07:42:31 GMTETag:37a907fc5dd7cfd0c428af78f09e95a9"到期时间:2018 年 7 月 21 日星期五 07:41:49 UTC缓存控制:max-age=31536000X-Cache-Status: HIT接受范围:字节

基于 Nginx 官方文档 我提供了优化的 Nginx S3 配置支持以下选项的缓存设置:

  • proxy_cache_revalidate 指示 NGINX 使用条件 GET从源服务器刷新内容时的请求
  • proxy_cache_use_stale 指令的更新参数指示 NGINX 在客户端请求项目时传送陈旧内容在从源服务器下载更新时,而不是将重复的请求转发到服务器
  • 在启用 proxy_cache_lock 的情况下,如果多个客户端请求缓存中不是当前的文件(MISS),则只有其中的第一个请求被允许通过源服务器

Nginx 配置:

worker_processes 1;守护进程关闭;error_log/dev/stdout 信息;pid/usr/local/var/nginx/nginx.pid;事件{worker_connections 1024;}http {default_type 文本/html;access_log/dev/stdout;发送文件;keepalive_timeout 65;proxy_cache_path/tmp/levels=1:2 keys_zone=s3_cache:10m max_size=500m不活动=60m use_temp_path=off;服务器 {听8080;位置/s3/{proxy_http_version 1.1;proxy_set_header 连接"";proxy_set_header 授权'';proxy_set_header 主机 yanpy.dev.s3.amazonaws.com;proxy_hide_header x-amz-id-2;proxy_hide_header x-amz-request-id;proxy_hide_header x-amz-meta-server-side-encryption;proxy_hide_header x-amz-server-side-encryption;proxy_hide_header 设置-Cookie;proxy_ignore_headers 设置-Cookie;proxy_intercept_errors on;add_header Cache-Control max-age=31536000;proxy_pass http://yanpy.dev.s3.amazonaws.com/;}位置/s3_cached/{proxy_cache s3_cache;proxy_http_version 1.1;proxy_set_header 连接"";proxy_set_header 授权'';proxy_set_header 主机 yanpy.dev.s3.amazonaws.com;proxy_hide_header x-amz-id-2;proxy_hide_header x-amz-request-id;proxy_hide_header x-amz-meta-server-side-encryption;proxy_hide_header x-amz-server-side-encryption;proxy_hide_header 设置-Cookie;proxy_ignore_headers 设置-Cookie;proxy_cache_revalidate on;proxy_intercept_errors on;proxy_cache_use_stale 错误超时更新 http_500 http_502 http_503 http_504;proxy_cache_lock 开启;proxy_cache_valid 200 304 60m;add_header Cache-Control max-age=31536000;add_header X-Cache-Status $upstream_cache_status;proxy_pass http://yanpy.dev.s3.amazonaws.com/;}}}

I´m performing some WPO tasks, so PageSpeed suggested me to leverage browser caching. I have improved it successfully for some static files in my Nginx server, however my image files stored in Amazon S3 server are still missing.

I have read an approach regarding update each file in S3 to include some header metatags (Expires and Cache-Control). I think this is not a good approach. I have thousands of files, so this is not feasible for me.

I think a most convenient approach is to configure my Nginx 1.6.0 server to proxy the S3 files. I have read about this, but I´m not skilled at all on server config, so I got a couple examples from these sites: https://gist.github.com/benjaminbarbe/1961db5ffbaad57eff12

I added this location code inside my server block in my nginx config file:

#inside server block
location /mybucket.s3.amazonaws.com/ {


        proxy_http_version     1.1;
        proxy_set_header       Host mybucket.s3.amazonaws.com;
        proxy_set_header       Authorization '';
        proxy_hide_header      x-amz-id-2;
        proxy_hide_header      x-amz-request-id;
        proxy_hide_header      Set-Cookie;
        proxy_ignore_headers   "Set-Cookie";
        proxy_buffering        off;
        proxy_intercept_errors on;      
        proxy_pass             http://mybucket.s3.amazonaws.com;
      }

For sure, this is not working for me. No header is included in my requests. So, first I think the requests are not matching the locations.

Accept-Ranges:bytes
Content-Length:90810
Content-Type:image/jpeg
Date:Fri, 23 Jun 2017 04:53:56 GMT
ETag:"4fd0be549fbcaf9b47c18a15146cdf16"
Last-Modified:Tue, 09 Jun 2015 09:47:13 GMT
Server:AmazonS3
x-amz-id-2:cKsq1qRra74DqVsTewh3P3sgzVUoPR8aAT2NFCuwA+JjCdDZfk7/7x/C0WPjBa51GEb4C8LyAIc=
x-amz-request-id:94EADB4EDD3DE1C1

解决方案

Your approach to proxy S3 files via Nginx makes a lot of sense. It solves number of problems and comes with extra benefits such masking URLs, proxy cache, speed up transferring by offload SSL/TLS. You do it almost right, let me show what is left to make it perfect.

For sample queries I use the S3 bucket and an image URL mentioned in the public comment to the original question.

We start with inspecting of Amazon S3 files' headers

curl -I http://yanpy.dev.s3.amazonaws.com/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg

HTTP/1.1 200 OK
Date: Sun, 25 Jun 2017 17:49:10 GMT
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Accept-Ranges: bytes
Content-Type: binary/octet-stream
Content-Length: 378843
Server: AmazonS3

We can see missing Cache-Control but Conditional GET headers have already been configured. When we reuse E-Tag/Last-Modified (that's how a browser's client side cache works), we get HTTP 304 alongside with empty Content-Length. An interpretation of that is client (curl in our case) queries the resource saying that no data transfer required unless file has been modified on the server:

curl -I http://yanpy.dev.s3.amazonaws.com/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg 
--header "If-None-Match: 37a907fc5dd7cfd0c428af78f09e95a9"

HTTP/1.1 304 Not Modified
Date: Sun, 25 Jun 2017 17:53:33 GMT
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Server: AmazonS3

curl -I http://yanpy.dev.s3.amazonaws.com/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg 
--header "If-Modified-Since: Wed, 21 Jun 2017 07:42:31 GMT"

HTTP/1.1 304 Not Modified
Date: Sun, 25 Jun 2017 18:17:34 GMT
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Server: AmazonS3

"PageSpeed suggested to leverage browser caching" that means Cache=control is missing. Nginx as proxy for S3 files solves not only problem with missing headers but also saves traffic using Nginx proxy cache.

I use macOS but Nginx configuration works on Linux exactly the same way without modifications. Step by step:

1.Install Nginx

brew update && brew install nginx

2.Setup Nginx to proxy S3 bucket, see configuration below

3.Request the file via Nginx. Please take a look at the Server header, we see Nginx rather than Amazon S3 now:

curl -I http://localhost:8080/s3/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg

HTTP/1.1 200 OK
Server: nginx/1.12.0
Date: Sun, 25 Jun 2017 18:30:26 GMT
Content-Type: binary/octet-stream
Content-Length: 378843
Connection: keep-alive
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Accept-Ranges: bytes
Cache-Control: max-age=31536000

4.Request the file using Nginx proxy with Conditional GET:

curl -I http://localhost:8080/s3/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg 
--header "If-None-Match: 37a907fc5dd7cfd0c428af78f09e95a9"

HTTP/1.1 304 Not Modified
Server: nginx/1.12.0
Date: Sun, 25 Jun 2017 18:32:16 GMT
Connection: keep-alive
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Cache-Control: max-age=31536000

5.Request the file using Nginx proxy cache, please take a look at X-Cache-Status header, its value is MISS until cache warmed up after first request

curl -I http://localhost:8080/s3_cached/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg
HTTP/1.1 200 OK
Server: nginx/1.12.0
Date: Sun, 25 Jun 2017 18:40:45 GMT
Content-Type: binary/octet-stream
Content-Length: 378843
Connection: keep-alive
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Cache-Control: max-age=31536000
X-Cache-Status: HIT
Accept-Ranges: bytes

Based on Nginx official documentation I provide the Nginx S3 configuration with optimised caching settings that supports the following options:

  • proxy_cache_revalidate instructs NGINX to use conditional GET requests when refreshing content from the origin servers
  • the updating parameter to the proxy_cache_use_stale directive instructs NGINX to deliver stale content when clients request an item while an update to it is being downloaded from the origin server, instead of forwarding repeated requests to the server
  • with proxy_cache_lock enabled, if multiple clients request a file that is not current in the cache (a MISS), only the first of those requests is allowed through to the origin server

Nginx configuration:

worker_processes  1;
daemon off;

error_log  /dev/stdout info;
pid        /usr/local/var/nginx/nginx.pid;


events {
  worker_connections  1024;
}


http {
  default_type       text/html;
  access_log         /dev/stdout;
  sendfile           on;
  keepalive_timeout  65;

  proxy_cache_path   /tmp/ levels=1:2 keys_zone=s3_cache:10m max_size=500m
                     inactive=60m use_temp_path=off;

  server {
    listen 8080;

    location /s3/ {
      proxy_http_version     1.1;
      proxy_set_header       Connection "";
      proxy_set_header       Authorization '';
      proxy_set_header       Host yanpy.dev.s3.amazonaws.com;
      proxy_hide_header      x-amz-id-2;
      proxy_hide_header      x-amz-request-id;
      proxy_hide_header      x-amz-meta-server-side-encryption;
      proxy_hide_header      x-amz-server-side-encryption;
      proxy_hide_header      Set-Cookie;
      proxy_ignore_headers   Set-Cookie;
      proxy_intercept_errors on;
      add_header             Cache-Control max-age=31536000;
      proxy_pass             http://yanpy.dev.s3.amazonaws.com/;
    }

    location /s3_cached/ {
      proxy_cache            s3_cache;
      proxy_http_version     1.1;
      proxy_set_header       Connection "";
      proxy_set_header       Authorization '';
      proxy_set_header       Host yanpy.dev.s3.amazonaws.com;
      proxy_hide_header      x-amz-id-2;
      proxy_hide_header      x-amz-request-id;
      proxy_hide_header      x-amz-meta-server-side-encryption;
      proxy_hide_header      x-amz-server-side-encryption;
      proxy_hide_header      Set-Cookie;
      proxy_ignore_headers   Set-Cookie;
      proxy_cache_revalidate on;
      proxy_intercept_errors on;
      proxy_cache_use_stale  error timeout updating http_500 http_502 http_503 http_504;
      proxy_cache_lock       on;
      proxy_cache_valid      200 304 60m;
      add_header             Cache-Control max-age=31536000;
      add_header             X-Cache-Status $upstream_cache_status;
      proxy_pass             http://yanpy.dev.s3.amazonaws.com/;
    }

  }
}

这篇关于Nginx 代理 Amazon S3 资源的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆