用于静态网页的 AngularJS SEO (S3 CDN) [英] AngularJS SEO for static webpages (S3 CDN)

查看:28
本文介绍了用于静态网页的 AngularJS SEO (S3 CDN)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在寻找方法来改进托管在像 Amazon S3 这样的 CDN 上的 angularJS 应用程序的 SEO(即没有后端的简单存储).大多数解决方案,PhantomJSprerender.ioseo.js 等,依赖后端识别?_escaped_fragment_ 爬虫生成的 url,然后从其他地方获取相关页面.即使 grunt-html-snapshot 最终也需要您这样做,即使您生成了快照页面提前.

I've been looking into ways to improve SEO for angularJS apps that are hosted on a CDN like Amazon S3 (i.e. simple storage with no backend). Most of the solutions out there, PhantomJS, prerender.io, seo.js etc., rely on a backend to recognise the ?_escaped_fragment_ url that the crawler generates and then fetch the relevant page from elsewhere. Even grunt-html-snapshot ultimately needs you to do this, even though you generate the snapshot pages ahead of time.

这个解决方案基本上依赖于使用cloudflare作为反向代理,考虑到他们的服务提供的大多数安全设备等对于静态站点来说是完全多余的,这似乎有点浪费.按照建议自己设置反向代理 here 似乎也有问题,因为它需要 i) 通过一个代理服务器路由我需要静态 html 的所有 AngularJS 应用程序,这可能会影响性能或 ii) 为每个应用程序设置一个单独的代理服务器,在哪一点我也可以设置一个后端,这在我工作的规模上是负担不起的.

This solution is basically relying on using cloudflare as a reverse proxy, which seems a bit of a waste given that most of the security apparatus etc. that their service provides is totally redundant for a static site. Setting up a reverse proxy myself as suggested here also seems problematic given that it would require either i) routing all AngularJS apps I need static html for through one proxy server which would potentially hamper performance or ii) setting up a separate proxy server for each app, at which point I may as well set up a backend, which isn't affordable at the scale I am working.

无论如何都要这样做,或者在谷歌更新他们的爬虫之前,静态托管的具有出色 SEO 的 AngularJS 应用程序基本上是不可能的?

Is there anyway of doing this, or are statically hosted AngularJS apps with great SEO basically impossible until google updates their crawlers?

根据 John Conde 的评论在网站管理员上重新发布.

Reposted on webmasters following John Conde's comments.

推荐答案

这里是一个完整的概述,介绍了如何使您的应用在 S3 等存储服务上对 SEO 友好,具有漂亮的 url(没有 #)以及所有内容都带有 grunt使用构建后要执行的简单命令:

Here is a full overview of how to make your app SEO-friendly on a storage service such as S3, with nice urls (no #) and everything with grunt with the simple command to be performed after build:

grunt seo

这仍然是一个解决方法的难题,但它正在起作用,并且是您能做的最好的事情.感谢@ericluwj 和他的博文,他们激励了我.

It's still a puzzle of workarounds, but it's working and it's the best you can do. Thank you to @ericluwj and his blogpost who inspired me.

目标&网址结构

目标是在您的 angular 应用程序中为每个状态创建 1 个 html 文件.唯一的主要假设是您通过使用 html5history(您应该这样做!)从您的 url 中删除#",并且您的所有路径都是绝对路径或使用角度状态.有很多帖子解释了如何做到这一点.

The goal is to create 1 html file per state in your angular app. The only major assumption is that you remove the '#' from your url by using html5history (which you should do !) and that all your paths are absolute or using angular states. There are plenty of posts explaining how to do so.

网址以这样的斜杠结尾http://yourdomain.com/page1/

我个人确保 http://yourdomain.com/page1(no 尾随斜杠)也到达了它的目的地,但这在这里是题外话.我还确保每种语言都有不同的状态和不同的网址.

Personally I made sure that http://yourdomain.com/page1 (no trailing slash) also reaches its destination, but that's off topic here. I also made sure that every language has a different state and a different url.

SEO 逻辑

我们的目标是,当有人通过 http 请求访问您的网站时:

Our goal is that when someone reaches your website through an http request:

  • 如果是搜索引擎爬虫:将他留在包含所需 html 的页面上.该页面还包含角度逻辑(例如启动您的应用程序),但爬网程序无法读取该逻辑,因此他故意卡在您提供给他的 html 中,并将对其编入索引.
  • 对于普通人和智能机器:确保 angular 被激活,清除生成的 html 并正常启动您的应用

繁重的任务

这里是繁重的任务:

  //grunt plugins you will need:
  grunt.loadNpmTasks('grunt-prerender');
  grunt.loadNpmTasks('grunt-replace');
  grunt.loadNpmTasks('grunt-wait');
  grunt.loadNpmTasks('grunt-aws-s3');

  //The grunt tasks in the right order
  grunt.registerTask('seo', 'First launch server, then prerender and replace', function (target) {
    grunt.task.run([
      'concurrent:seo' //Step 1: in parrallel launch server, then perform so-called seotasks
    ]);
  });

  grunt.registerTask('seotasks', [
    'http', //This is an API call to get all pages on my website. Skipping this step in this tutorial.
    'wait', // wait 1.5 sec to make sure that server is launched
    'prerender', //Step 2: create a snapshot of your website
    'replace', //Step 3: clean the mess
    'sitemap', //Create a sitemap of your production environment
    'aws_s3:dev' //Step 4: upload
  ]);

第 1 步:使用 concurrent:seo 启动本地服务器

我们首先需要启动一个本地服务器(如 grunt serve),以便我们可以拍摄我们网站的快照.

Step 1: Launch local server with concurrent:seo

We first need to launch a local server (like grunt serve) so that we can take snapshots of our website.

//grunt config
concurrent: {
  seo: [
    'connect:dist:keepalive', //Launching a server and keeping it alive
    'seotasks' //now that we have a running server we can launch the SEO tasks
  ]
}

第 2 步:使用 grunt prerender 创建网站快照

grunt-prerender 插件允许您使用 PhantomJS 拍摄任何网站的快照.在我们的例子中,我们想要对我们刚刚启动的 localhost 网站的所有页面进行快照.

Step 2: Create a snapshot of your website with grunt prerender

The grunt-prerender plugins allows you to take a snapshot of any website using PhantomJS. In our case we want to take a snapshot of all pages of the localhost website we just launched.

//grunt config
prerender: {
  options: {
    sitePath: 'http://localhost:9001', //points to the url of the server you just launched. You can also make it point to your production website.
    //As you can see the source urls allow for multiple languages provided you have different states for different languages (see note below for that)
    urls: ['/', '/projects/', '/portal/','/en/', '/projects/en/', '/portal/en/','/fr/', '/projects/fr/', '/portal/fr/'],//this var can be dynamically updated, which is done in my case in the callback of the http task
    hashed: true,
    dest: 'dist/SEO/',//where your static html files will be stored
    timeout:5000,
    interval:5000, //taking a snapshot of how the page looks like after 5 seconds.
    phantomScript:'basic',
    limit:7 //# pages processed simultaneously 
  }
}

第 3 步:用 grunt 替换清理混乱

如果您打开预渲染文件,它们将适用于爬虫,但不适用于人类.对于使用 chrome 的人,您的指令将加载两次.因此,您需要在 angular 被激活之前(即,在 head 之后) 将智能浏览器重定向到您的主页.

Step 3: Clean the mess with grunt replace

If you open the pre-rendered files, they will work for crawlers, but not for humans. For humans using chrome, your directives will load twice. Therefore you need to redirect intelligent browsers to your home page before angular gets activated (i.e., right after head).

//Add the script tag to redirect if we're not a search bot
replace: {
  dist: {
    options: {
      patterns: [
        {
          match: '<head>',
          //redirect to a clean page if not a bot (to your index.html at the root basically).
          replacement: '<head><script>if(!/bot|googlebot|crawler|spider|robot|crawling/i.test(navigator.userAgent)) { document.location = "/#" + window.location.pathname; }</script>'
          //note: your hashbang (#) will still work.
        }
      ],
      usePrefix: false
    },
    files: [
      {expand: true, flatten: false, src: ['dist/SEO/*/**/*.html'], dest: ''} 
    ]
  }

还要确保在 ui-view 元素的 index.html 中包含此代码,这会在 angular 启动之前清除所有生成的 html 指令.

Also make sure you have this code in your index.html on your ui-view element, which clears all the generated html directives BEFORE angular starts.

<div ui-view autoscroll="true" id="ui-view"></div>

<!-- this script is needed to clear ui-view BEFORE angular starts to remove the static html that has been generated for search engines who cannot read angular -->
<script> 
  if(!/bot|googlebot|crawler|spider|robot|crawling/i.test( navigator.userAgent)) { document.getElementById('ui-view').innerHTML = ""; }
</script>

第 4 步:上传到 aws

您首先上传包含您的构建的 dist 文件夹.然后用预先渲染和更新的文件覆盖它.

Step 4: Upload to aws

You first upload your dist folder which contains your build. Then you overwrite it with the files you prerendered and updated.

aws_s3: {
  options: {
    accessKeyId: "<%= aws.accessKeyId %>", // Use the variables
    secretAccessKey: "<%= aws.secret %>", // You can also use env variables
    region: 'eu-west-1',
    uploadConcurrency: 5, // 5 simultaneous uploads
  },
  dev: {
    options: {
      bucket: 'xxxxxxxx'
    },
    files: [
      {expand: true, cwd: 'dist/', src: ['**'], exclude: 'SEO/**', dest: '', differential: true},
      {expand: true, cwd: 'dist/SEO/', src: ['**'], dest: '', differential: true},
    ]
  }
}

就是这样,你有你的解决方案!人类和机器人都可以读取您的网络应用

这篇关于用于静态网页的 AngularJS SEO (S3 CDN)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆