是否html5mode(真)影响谷歌的搜索爬虫 [英] Does html5mode(true) affect google search crawlers

查看:138
本文介绍了是否html5mode(真)影响谷歌的搜索爬虫的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我阅读本规范这是Web服务器之间的协议和搜索引擎抓取工具,允许动态创建的内容是爬虫可见。
它说有,为了使履带指数HTML5的应用程序必须在URL中实现使用路由#!。在角 html5mode(真)我们摆脱网址的这部分哈希的。我不知道这是否会从索引我的网站prevent爬虫。

I'm reading this specification which is an agreement between web servers and search engine crawlers that allows for dynamically created content to be visible to crawlers. It's stated there that in order for a crawler to index html5 application one must implement routing using #! in URLs. In angular html5mode(true) we get rid of this hashed part of the URL. I'm wondering whether this is going to prevent crawlers from indexing my website.

推荐答案

简短的回答 - 不,html5mode不会弄乱你的索引,但阅读

Short answer - No, html5mode will not mess up your indexing, but read on.

重要提示:谷歌和必应可以抓取基于AJAX内容,而无需HTML快照

我知道,文档,链接到说,否则而是一两年之前,他们正式宣布,他们处理AJAX内容的没有的需要HTML快照,只要您使用pushstates,但大量的文档是旧的,不幸的是没有更新。

I know, the documentation you link to says otherwise but about a year or two ago they officially announced that they handle AJAX content without the need for HTML snapshots, as long as you use pushstates, but a lot of the documentation is old and unfortunately not updated.

有关AJAX抓取工作开箱的要求是,你正在使用pushstates更改网址。这正是html5mode的角度做(也正是很多其他框架做)。当pushstates是爬虫会等待A​​jax调用来完成,并为JavaScript才指数它来更新页面。你甚至可以更新你的路由器就像网页标题的东西,甚至meta标签,它会正常指标。在本质上,你不需要做任何事情,有服务器端和客户端在这种情况下呈现站点之间没有什么区别。

The requirement for AJAX crawling to work out of the box is that you are changing your url using pushstates. This is just what html5mode in Angular does (and also what a lot of other frameworks do). When pushstates is on the crawlers will wait for ajax calls to finish and for javascript to update the page before they index it. You can even update things like page-title or even meta tags in your router and it will index properly. In essence you don't need to do anything, there is no difference between server-side and client-side rendered sites in this case.

需要明确的是,很多搜索引擎优化分析工具(如万盎司)会吐出警告使用pushstates页。这是因为这些工具(如果你跟他们的代表)都在写不是最新的时间,所以忽略它们。

To be clear, a lot of SEO-analysis tools (such as Moz) will spit out warnings on pages using pushstates. That's because those tools (and their reps if you talk to them) are at the time of writing not up to date, so ignore them.

最后,这样做的时候请确保您的不可以使用片段元标记从下面。如果您有标记的爬虫会认为你要使用的非pushstates方法和事情可能会搞砸。

Finally, make sure you are not using the fragment meta-tag from below when doing this. If you have that tag the crawlers will think that you want to use the non-pushstates method and things might get messed up.

有很少的理由不使用pushstates与棱角分明,但如果你没有,你需要按照问题挂钩的准则。总之你在服务器上创建的HTML快照,然后使用片断元标记改变您的网址片段是#!代替#。

There is very little reason not to use pushstates with Angular, but if you don't you need to follow the guidelines linked to in the question. In short you create snapshots of the html on your server and then you use the fragment meta tag to change your url-fragment to be "#!" instead of "#".

<meta name="fragment" content="!" />

当履带发现了一个页面这样它会删除URL的片段组成部分,而不是请求的URL与参数_escaped_fragment_,并且可以响应服务您的快照页面。给履带正常的静态页面进行索引。

When a crawler finds a page like this it will remove the fragment part of the url and instead requests the url with the parameter _escaped_fragment_, and you can serve your snapshotted page in response. Giving the crawler a normal static page to index.

请注意,该片段元标记应仅用于如果要触发此行为。如果您正在使用pushstates并希望页面索引方式,不要使用此标记。

Note that the fragment meta-tag should only be used if you want to trigger this behaviour. If you are using pushstates and want the page to index that way, don't use this tag.

此外,在角使用快照时,您可以对html5mode。在html5mode片段是隐藏的,但它在技术上仍存在,并且仍然会引发相同的行为,假设片段元标记设置。

Also, when using snapshots in Angular you can have html5mode on. In html5mode the fragment is hidden but it is still technically exists and will still trigger the same behaviour, assuming the fragment meta-tag is set.

虽然谷歌和Bing都将抓取您的AJAX页面没有问题(如果你使用的是pushstates),Facebook不会。 Facebook并没有理解Ajax的内容,仍然需要特殊的解决方案,如HTML快照专门提供给Facebook的机器人(用户代理facebookexternalhit / 1.1)。

While both Google and Bing will crawl your AJAX pages without problem (if you are using pushstates), Facebook will not. Facebook does not understand ajax-content and still requires special solutions, like html snapshots served specifically to the facebook bot (user agent facebookexternalhit/1.1).

修改 - 我也许应该提及我已经部署站点,所有这些版本。都与html5mode,片段meta标签和快照,没有任何的快照,只是依靠pushState的爬行。如上所述这一切工作正常,除了pushstates和Facebook。

Edit - I should probably mention that I have deployed sites with all of these versions. Both with html5mode, fragment meta tag and snapshots and without any snapshots and just relying on the pushstate-crawling. It all works fine, except for pushstates and Facebook as noted above.

这篇关于是否html5mode(真)影响谷歌的搜索爬虫的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆