使用 HTML5 模式路由在 AngularJS 站点上爬行的 Google 机器人 [英] Google bot crawling on AngularJS site with HTML5 Mode routes

查看:38
本文介绍了使用 HTML5 模式路由在 AngularJS 站点上爬行的 Google 机器人的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个使用 HTML5 路由的 AngularJS 站点.我刚刚做了一些测试谷歌抓取"运行.结果有点混乱:

We have an AngularJS site using HTML5 routes. I just did some test "Fetch as Google" runs. The results are a bit confusing:

但是,我们已经准备好让 Google 无法抓取我们的网站,所以我们已经添加了://developers.google.com/webmasters/ajax-crawling/docs/getting-started" rel="noreferrer">https://developers.google.com/webmasters/ajax-crawling/docs/getting-started(3. 处理没有散列片段的页面"部分).在我们的 Nginx 配置中,我们有这样的东西:

However, we are already prepared for Google to not be able to crawl our site, so we have already added , so the Google bot revisits our page with "?_escaped_fragment_=". We followed this, https://developers.google.com/webmasters/ajax-crawling/docs/getting-started (section "3. Handle pages without hash fragments"). In our Nginx config we have something like this:

if ($args ~ "_escaped_fragment_=") {
    serve the static HTML snapshots
}

,如果我们自己传递 _escaped_fragment_= ,它确实可以正常工作.但是,Google bot 从未尝试使用此参数抓取我们的网站,因此它从未抓取过快照.我们错过了什么吗?我们还应该在 Nginx conf 中为 Google bot 添加代理检测吗?像这样吗?

, and indeed it works fine, if we pass the _escaped_fragment_= ourselves. However, the Google bot never tried to crawl our site with this param, so it never crawled the snapshot. Are we missing something? Should we also add agent detection for Google bot on our Nginx conf? Something like this?

if ($http_user_agent ~* "googlebot|yahoo|bingbot|baiduspider|yandex|yeti|yodaobot|gigabot|ia_archiver|facebookexternalhit|twitterbot|developers\.google\.com") {            

server from snapshots

}

如果我们能更好地理解这一点就好了,在此先感谢您!

It would be great if we can understand this better, thank you so much in advance!

更新:
我刚读到这个,http://scotch.io/tutorials/javascript/angularjs-seo-with-prerender-io?_escaped_fragment_=tag#caveats.因此,似乎在使用手动工具(Fetch as Google)时,我们应该通过 #!或 ?_escaped_fragment_= 在正确的位置.事实上,如果我在我们的例子中传递 ?_escaped_fragment_= ,我会看到我们创建的 HTML 快照.

UPDATE:
I just read this, http://scotch.io/tutorials/javascript/angularjs-seo-with-prerender-io?_escaped_fragment_=tag#caveats. So, it seems that when using the manual tools (Fetch as Google), we should pass ourselves either #! or ?_escaped_fragment_= in the right place. Indeed, if I pass ?_escaped_fragment_= in our case, I do see the HTML snapshot that we have created.

这是真的吗?真的是这样吗?

Is that true? Is this how it works indeed?

更新 2在这个帖子的底部,一位谷歌员工验证,对于谷歌网站管理员Fetch as Google",您需要自己手动传递 _escaped_fragment_= 参数,https://productforums.google.com/forum/#!msg/webmasters/fZjdyjq0n98/PZ-nlq_2RjcJ

UPDATE 2 On the bottom of this thread, a Google employee verifies that for Google Webmasters "Fetch as Google", you need to manually pass the _escaped_fragment_= param yourself, https://productforums.google.com/forum/#!msg/webmasters/fZjdyjq0n98/PZ-nlq_2RjcJ

干杯,
伊拉克利斯

Cheers,
Iraklis

推荐答案

我将根据我们上个月开发 HTML5 模式 SPA 的经验尝试回答您的问题.

I will try to answer your questions based on our experiences in the last month of developing a SPA with HTML5 mode.

这其实很简单,但很容易被忽视.事实上,有两种不同的方法可以让 Googlebot 尝试 escaped_fragment.第一种方法是在非 html5 模式下运行您的网站.这意味着您的网址将采用以下形式:

This is actually quite simple but easy to overlook. In fact, there are two different ways to get Googlebot to try the escaped_fragment. The first method is to run your site in non-html5 mode. This means that your URLs will be of the form:

http://my.domain.com/base/#!some/path/on/website

Googlebot 可识别 #!并使用更改后的 URL 再次调用您的服务器:

Googlebot recognizes the #! and makes a second call to your server with an altered URL:

http://my.domain.com/base/?_escaped_fragment_=some/path/on/website

然后您可以随意处理.让 Googlebot 尝试 _escaped_fragment_ 模式的第二种方法是在您提供给机器人的索引页上包含以下元标记:

Which you can then handle as you wish. The second way to get Googlebot to try _escaped_fragment_ mode is to include the following meta tag on the index page you supply to the bot:

<meta name="fragment" content="!">

这将使 googlebot 每次看到标签时检查网页的其他版本.有趣的是,您可以同时使用这两种技术,也可以执行我们最终做的事情,即在带有元标记的 html5 模式下运行.这意味着您的网址将被转义如下:

This will make googlebot check the other version of the webpage every time it sees the tag. Interestingly you can use both these techniques together or you can do what we ended up doing, which is running in html5 mode with the meta tag. This means that your URLs will be escaped as follows:

http://my.domain.com/base/some/path/on/website?_escaped_fragment_=

有趣的是,机器人不会在片段的末尾放置任何东西.但是,根据您运行的网络服务器,您可以轻松地将其与匹配_escaped_fragment_"文本的模式映射到您的备用机器人页面.有关转义片段的更多信息,请转到 此处.

Interestingly, the bot will not put anything at the end of the fragment. But depending on what webserver you are running, you can easily map this with a pattern matching the "_escaped_fragment_" text to your alternate bot page. For more information on the escaped fragment go here.

自 2014 年初以来,Google 的机器人实际上可以在有限范围内解释 JavaScript.有关更多信息,请阅读 Google 网站管理员的官方博客条目 此处.但是,正如博客条目中明确指出的那样,这有很多警告.例如:

Google's Bots can actually interpret JavaScript to a limited extent since early 2014. For more information, read the official blog entry on google webmasters here. However, as is made clear in the blog entry, this comes with a lot of caveats. For instance:

  1. Googlebot 不保证执行所有 javascript 代码.
  2. Googlebot 会尝试在 javascript 中查找要跟踪的链接,并使用它们来帮助查找更多页面.
  3. Googlebot 将通过尽可能多地执行 javascript 来在网站站长工具中呈现预览(因此在呈现的版本中缺少 {{}}).
  4. Googlebot 不一定会使用呈现的版本来为其索引构建有关您网站的元信息.

截至 2014 年 12 月 18 日,除了在 javascript 中查找要跟踪的链接之外,我们仍然不确定 Googlebot 是否可以在渲染模式下从 SPA 中提取任何信息以用于其索引.根据我们的经验,Googlebot 会在其索引列表中包含 {{}},这样当您尝试使用 {{}} 填充元信息(描述、关键字、标题等)时,您的网站在 Google 搜索中看起来像这样结果:

As of 18/12/2014, we are still unsure if Googlebot can actually extract any information from an SPA in rendered mode for its index beyond finding links to follow in the javascript. In our experience, Googlebot will include {{}} in its index listing so that when you try to use {{}} to fill meta information (description, keywords, title, etc...) your site looks like this in Google Search results:

{{meta.siteTitle}}
http://my.domain.com/base/some/path/on/website
{{meta.description}}

{{meta.siteTitle}}
http://my.domain.com/base/some/path/on/website
{{meta.description}}

而不是您期望的看起来像这样:

rather than what you expect which might look like this:


http://my.domain.com/base/some/path/on/website
这是我域上的一个随机页面.肯定是一个很好的示例页面!

Domain
http://my.domain.com/base/some/path/on/website
This is a random page on my domain. An excellent example page to be sure!

这篇关于使用 HTML5 模式路由在 AngularJS 站点上爬行的 Google 机器人的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆