为什么搜索引擎爬虫运行JavaScript? [英] Why do search engine crawlers not run javascript?

查看:173
本文介绍了为什么搜索引擎爬虫运行JavaScript?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用了大量Ajax请求来呈现我的网页一些高级JavaScript应用程序。为了让应用程序抓取(由谷歌),我必须遵循 https://开头developers.google.com/webmasters/ajax-crawling/?hl=fr 。这告诉我们做这样的事情:重新设计我们联系,创建HTML快照,......使网站搜索

I have been working with some advanced javascript applications using a lot of ajax requests to render my page. To make the applications crawlable (by google), I have to follow https://developers.google.com/webmasters/ajax-crawling/?hl=fr . This tells us to do something like: redesigning our links, creating html snapshots,... to make the site searchable.

我不知道为什么爬虫程序不运行JavaScript获得的呈现的页面和索引就可以了。有没有这背后的理由?或者,这是搜索引擎的功能缺失可能会在未来?

I wonder why crawlers don't run javascript to get the rendered page and index on it. Is there a reason behind this? Or it's a missing feature of search engines that may come in the future?

推荐答案

尽管Googlebot会实际上并处理写在JS的网站。最大的问题与阿贾克斯的网站是,即使Googlebot会可以执行js和处理Ajax请求。

Even though GoogleBot actually does handle sites written in js. The big problem with ajax sites is that even if GoogleBot can execute js and handle ajax requests.

这不完全可能的网络爬虫知道当页面加载完成。出于这个原因,网络爬虫可以加载一个页面和索引页面之前就开始做Ajax请求。让我们说,一个脚本将获得对页面滚动执行。这很可能是谷歌机器人不会触发每一个可能发生的事情。

It's not exactly possible for the web crawler to know when the page finished loading. For that reason, a web crawler could load a page and index the page before it started doing ajax requests. Let say a script will get executed on page scroll. It's very likely that the google bot will not trigger every possible events.

另一个问题是导航

由于导航可以在不刷新页面来完成,一个网址,可以映射到多个的查看结果。出于这个原因,谷歌问developpers以保持使用静态网页,以支持这些网页,这将是不可访问的,否则的网页的副本。他们将获得索引。

Since navigation can be done without page reloading, one url can map to multiple "view result". For that reason, google ask developpers to keep a copy of pages using static pages to support those pages that would be inaccessible otherwise. They are going to get indexed.

如果你的网站可以通过一个完全合格的URL访问的每个页面。那么你应该不会有问题的索引你的网站。

If your site can have each page accessible through a fully qualified url. Then you shouldn't have problem indexing your site.

也就是说,脚本会得到运行。但它不能确定该页面后,完成了履带式将索引处理所有的脚本。

That said, scripts are going to get run. But it's not certain that the crawler will index the page after it finished handling all scripts.

下面是一个链接:

<一个href="http://www.forbes.com/sites/velocity/2010/06/25/google-isnt-just-reading-your-links-its-now-running-your-$c$c/"相对=nofollow> Googlebot会更聪明:这是写在2010年,我们可以期待的网络爬虫了,因为再聪明得多。

GoogleBot smarter: It was written in 2010 and we can expect that the webcrawlers got much smarter since then.

这篇关于为什么搜索引擎爬虫运行JavaScript?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆