谷歌机器人在AngularJS站点HTML5模式的爬行路线 [英] Google bot crawling on AngularJS site with HTML5 Mode routes

查看:100
本文介绍了谷歌机器人在AngularJS站点HTML5模式的爬行路线的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们在使用HTML5路线的AngularJS网站。我只是做了一些测试取像谷歌运行。结果是有点混乱:

We have an AngularJS site using HTML5 routes. I just did some test "Fetch as Google" runs. The results are a bit confusing:


  • 在取卡,我看到我们的网站,因为它看起来在查看源代码,所有的前端绑定{{}},而不是所有的HTML渲染

  • 在渲染选项卡中,我们的网站看起来完全正常,没有{{}}变量,它看起来像谷歌机器人抓取并展示现场罚款,这也许是在这一点,<线href=\"http://googlewebmastercentral.blogspot.ae/2014/05/rendering-pages-with-fetch-as-google.html\">http://googlewebmastercentral.blogspot.ae/2014/05/rendering-pages-with-fetch-as-google.html.

  • On the fetching tab, I see our site as it looks on view source, with all the front end bindings {{ }}, and not all the HTML rendered
  • On the rendering tab, our site looks perfectly fine, no {{ }} variables, it seems like Google bot fetched and rendered the site fine, which is maybe in line with this, http://googlewebmastercentral.blogspot.ae/2014/05/rendering-pages-with-fetch-as-google.html.

不过,我们已经prepared为谷歌无法抓取我们的网站,所以我们已经增加,因此,谷歌机器人再次访问我们的页面与?_escaped_fragment_ =我们遵循这个,<一个href=\"https://developers.google.com/webmasters/ajax-crawling/docs/getting-started\">https://developers.google.com/webmasters/ajax-crawling/docs/getting-started (参见3.拉手网​​页没有哈希代码)在我们的配置Nginx的,我们有这样的事情:

However, we are already prepared for Google to not be able to crawl our site, so we have already added , so the Google bot revisits our page with "?_escaped_fragment_=". We followed this, https://developers.google.com/webmasters/ajax-crawling/docs/getting-started (section "3. Handle pages without hash fragments"). In our Nginx config we have something like this:

if ($args ~ "_escaped_fragment_=") {
    serve the static HTML snapshots
}

,而事实上它工作正常,如果我们通过_escaped_fragment_ =自己。然而,谷歌机器人永远不会尝试抓取我们的网站与此参数,所以它永远不会抓取快照。我们是否失去了一些东西?我们应该也对我们的Nginx的conf添加剂检测谷歌机器人?像这样的事情?

, and indeed it works fine, if we pass the _escaped_fragment_= ourselves. However, the Google bot never tried to crawl our site with this param, so it never crawled the snapshot. Are we missing something? Should we also add agent detection for Google bot on our Nginx conf? Something like this?

if ($http_user_agent ~* "googlebot|yahoo|bingbot|baiduspider|yandex|yeti|yodaobot|gigabot|ia_archiver|facebookexternalhit|twitterbot|developers\.google\.com") {            

server from snapshots

}

这将是巨大的,如果我们能够更好的理解,谢谢你这么多提前!

It would be great if we can understand this better, thank you so much in advance!

更新:结果
我刚刚看了这个,<一个href=\"http://scotch.io/tutorials/javascript/angularjs-seo-with-$p$prender-io?_escaped_fragment_=tag#caveats\">http://scotch.io/tutorials/javascript/angularjs-seo-with-$p$prender-io?_escaped_fragment_=tag#caveats.因此,似乎使用手动工具时(抓取谷歌),我们应该通过自己或者#!还是?_escaped_fragment_ =在正确的地方。事实上,如果我通过?_escaped_fragment_ =在我们的情况下,我看到,我们已经创建了HTML快照。

UPDATE:
I just read this, http://scotch.io/tutorials/javascript/angularjs-seo-with-prerender-io?_escaped_fragment_=tag#caveats. So, it seems that when using the manual tools (Fetch as Google), we should pass ourselves either #! or ?_escaped_fragment_= in the right place. Indeed, if I pass ?_escaped_fragment_= in our case, I do see the HTML snapshot that we have created.

是这样吗?这是它是如何工作的确?

Is that true? Is this how it works indeed?

更新2
在此线程的底部,一个谷歌员工证实,对谷歌网站管理员取像谷歌,你需要手动传递_escaped_fragment_ =参数自己,<一个href=\"https://productforums.google.com/forum/#!msg/webmasters/fZjdyjq0n98/PZ-nlq_2RjcJ\">https://productforums.google.com/forum/#!msg/webmasters/fZjdyjq0n98/PZ-nlq_2RjcJ

干杯,结果
伊拿克里斯

Cheers,
Iraklis

推荐答案

我会尽量根据我们的发展与HTML5模式SPA的最后一个月经验为您解答。

I will try to answer your questions based on our experiences in the last month of developing a SPA with HTML5 mode.

这其实是很简单,但容易被忽视。事实上,有两种不同的方式来获得的Googlebot尝试escaped_fragment。第一种方法是在非HTML5模式下运行您的网站。这意味着你的网址会的形式为:

This is actually quite simple but easy to overlook. In fact, there are two different ways to get Googlebot to try the escaped_fragment. The first method is to run your site in non-html5 mode. This means that your URLs will be of the form:

<一个href=\"http://my.domain.com/base/#!some/path/on/website\">http://my.domain.com/base/#!some/path/on/website

Googlebot的识别#!并使得第二个电话到您的服务器具有改变网址:

Googlebot recognizes the #! and makes a second call to your server with an altered URL:

<一个href=\"http://my.domain.com/base/?_escaped_fragment_=some/path/on/website\">http://my.domain.com/base/?_escaped_fragment_=some/path/on/website

然后你就可以处理你的愿望。第二个办法让Googlebot的尝试_escaped_fragment_模式,包括您提供给机器人的索引页以下meta标签:

Which you can then handle as you wish. The second way to get Googlebot to try _escaped_fragment_ mode is to include the following meta tag on the index page you supply to the bot:

<meta name="fragment" content="!">

这会使Googlebot的检查每次看到标签时网页的其他版本。有趣的是,你可以使用这两种技术一起,也可以做我们最后做,这是在与meta标签HTML5模式下运行。这意味着如下您的网址会被转义:

This will make googlebot check the other version of the webpage every time it sees the tag. Interestingly you can use both these techniques together or you can do what we ended up doing, which is running in html5 mode with the meta tag. This means that your URLs will be escaped as follows:

<一个href=\"http://my.domain.com/base/some/path/on/website?_escaped_fragment_=\">http://my.domain.com/base/some/path/on/website?_escaped_fragment_=

有趣的是,机器人不会在片段的结尾放东西。但是,这取决于你正在运行的Web服务器,你可以很容易地匹配_escaped_fragment_文字到您的备用机器人页面的模式映射这一点。有关逃脱片段更多信息,请 href=\"https://developers.google.com/webmasters/ajax-crawling/docs/specification\">。

Interestingly, the bot will not put anything at the end of the fragment. But depending on what webserver you are running, you can easily map this with a pattern matching the "_escaped_fragment_" text to your alternate bot page. For more information on the escaped fragment go here.

谷歌的搜索引擎实际上可以跨preT JavaScript来自2014年初在有限范围内的详细信息,请阅读谷歌网站管理员官方博客条目<一个href=\"http://googlewebmastercentral.blogspot.ch/2014/05/understanding-web-pages-better.html\">here.然而,由于在博客中明确表示,这种自带了很多的变数。例如:

Google's Bots can actually interpret JavaScript to a limited extent since early 2014. For more information, read the official blog entry on google webmasters here. However, as is made clear in the blog entry, this comes with a lot of caveats. For instance:


  1. Googlebot不会保证执行所有的JavaScript code。

  2. Googlebot会试图发现在JavaScript中的链接遵循并利用它们来帮助找到更多的页面。

  3. Googlebot会通过执行尽可能多的JavaScript,因为它可以呈现站长工具preVIEW(因此缺少{{}}在渲染的版本)。

  4. 的Googlebot不一定以建立你的网站的索引元信息使用渲染的版本。

由于 18/12/2014 ,我们仍不能确定如果Googlebot实际上可以提取的SPA在渲染模式中的任何信息,其超越发现链接中的JavaScript跟随指数。根据我们的经验,Googlebot会包括{{}}在其索引的列表,因此,当您尝试使用{{}}填写元信息(描述,关键词,标题,等...)您的网站看起来像这样在谷歌搜索结果:

As of 18/12/2014, we are still unsure if Googlebot can actually extract any information from an SPA in rendered mode for its index beyond finding links to follow in the javascript. In our experience, Googlebot will include {{}} in its index listing so that when you try to use {{}} to fill meta information (description, keywords, title, etc...) your site looks like this in Google Search results:

{{meta.siteTitle}}

  <一href=\"http://my.domain.com/base/some/path/on/website\">http://my.domain.com/base/some/path/on/website

  {{meta.description}}

{{meta.siteTitle}}
http://my.domain.com/base/some/path/on/website
{{meta.description}}

而不是你所期望的可能是这样的:

rather than what you expect which might look like this:



  <一href=\"http://my.domain.com/base/some/path/on/website\">http://my.domain.com/base/some/path/on/website

  这是我的域名随机页面。一个很好的例子页面可以肯定!

Domain
http://my.domain.com/base/some/path/on/website
This is a random page on my domain. An excellent example page to be sure!

这篇关于谷歌机器人在AngularJS站点HTML5模式的爬行路线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆