jQuery在/目录上的网站管理员工具中导致404错误 [英] jQuery causing 404 errors in Webmaster Tools on /a directory

查看:132
本文介绍了jQuery在/目录上的网站管理员工具中导致404错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Googlebot似乎在我的jQuery内部爬行,并创建以不存在的/ a结尾的链接,然后将它们报告为404错误。

  http://www.mySite.com/a 

网站验证green在W3C。



/ a来自jQuery本身。 编辑:以下是jQuery v1.5和1.5.2中的一行代码(我只查看了里面的两个)

 < a href ='/ a'style ='color:red; float:left; opacity:.55;'> a< / a> 

现在,我在htaccess中重定向它, p>

 重定向301 / a http://www.mysite.com 

有人知道为什么Googlebot会进入jQuery吗?




编辑:



我已经用robots.txt文件阻止了jQuery文件,但我真的没想到Googlebot进入外部JavaScript文件。






编辑2:



以下是Google员工JohnMu在我在Google网上论坛上创建的主题。看起来我打算做301毕竟。


JohnMu



Google员工



4:39 AM

大家



只需简单说明一下 - 是的,我们
从jQuery JavaScript中为许多
站点提供/ a链接。但是,
通常不是问题,如果我们
将/ a看作是404,那么对我们来说就是
。与其他404-URLs
一样,我们会将其列为
网站站长工具中的抓取错误,但同样,这不是
会成为抓取问题,
索引或排名。如果您想要
,请确保它不会在网站管理员工具中触发
抓取错误,那么I
会建议您将该网址的
重定向到您的主页(禁止
的URL也会将其作为
抓取错误 - 它将被列为robots.txt禁止的
URL)。



我也建议不要明确
禁止抓取jQuery
文件。虽然我们通常不会
自行索引,但我们可能需要
才能为您的网站生成良好的即时
预览。


因此,总结一下:如果您在网站管理员
工具的抓取错误中看到/ a
,您可以离开它就像
那样,它不会造成任何问题。如果您想将
删除,那么您的
可以将301重定向到您的
主页。



干杯



约翰





(编辑 - 请参阅:网络爬虫如何处理javascript - 表示谷歌可能会尝试从脚本中提取一些东西,感到惊讶的是它不会被识别出某些属于jQuery的东西,您是否使用非标准名称来包含它?)



另外,有没有可能您的jQuery头包含不正确?也许它是用一个HTML MIME类型服务的,大多数浏览器可能不会在乎,因为它们的类型也是由脚本 include设置的,但是也许bot会决定解析。

在任何情况下,不要设置重定向,为什么不使用 robots.txt ?添加以下行:

 不允许:/ a 

你也可以尝试修复jQuery。混淆链接有点可能会诀窍,例如更改违规行:

  div.innerHTML =< link />< table>< / table><<< ;+a hr+ef ='/ a'
+style ='color:red; float:left; opacity:.55;'> a< / a>< input type = '复选框'/>中;

如果google足够聪明地真正解析字符串连接,这会让我感到震惊,您可以进一步并将诸如href之类的内容分配给一个变量,然后与之连接。我不能相信他们的js扫描仪会走得那么远,这基本上就像试图运行它。


The Googlebot seems to be crawling up inside my jQuery and creating links ending in /a that don't exist and then reporting them as 404 errors.

http://www.mySite.com/a

The site validates green at the W3C.

The "/a" is coming from inside jQuery itself. Edit: The following is a line of code within jQuery v1.5 and 1.5.2 (the only two I looked inside)

<a href='/a' style='color:red;float:left;opacity:.55;'>a</a>

For now, I'm redirecting it within htaccess before it gets out of hand...

Redirect 301   /a   http://www.mysite.com

Does anyone know why/how the Googlebot would go inside jQuery?


EDIT:

I've since blocked the jQuery file with the robots.txt file but I really wasn't expecting the Googlebot to go into external JavaScript files.


EDIT 2:

The following is a response from Google employee JohnMu on this issue in the thread I started at Google Groups. Looks like I'm going to do the 301 after all.

JohnMu

Google Employee

4:39 AM

Hi guys

Just a short note on this -- yes, we are picking up the "/a" link for many sites from jQuery JavaScript. However, that generally isn't a problem, if we see "/a" as being a 404, then that's fine for us. As with other 404-URLs, we'll list it as a crawl error in Webmaster Tools, but again, that's not going to be a problem for crawling, indexing, or ranking. If you want to make sure that it doesn't trigger a crawl error in Webmaster Tools, then I would recommend just 301 redirecting that URL to your homepage (disallowing the URL will also bring it up as a crawl error - it will be listed as a URL disallowed by robots.txt).

I would also recommend not explicitly disallowing crawling of the jQuery file. While we generally wouldn't index it on its own, we may need to access it to generate good Instant Previews for your site.

So to sum it up: If you're seeing "/a" in the crawl errors in Webmaster Tools, you can just leave it like that, it won't cause any problems. If you want to have it removed there, you can do a 301 redirect to your homepage.

Cheers

John

解决方案

It looks like jQuery uses that as a test template to determine browser support for features. I am not sure why this would ever be seen by a google bot, though. I was not aware that web crawlers typically ran any Javascript. That would mean that they are actually functioning as a web browser (which one I wonder?). Seems unlikely.

(Edit - see this: how do web crawlers handle javascript - indicates that google may try to pull some stuff from scripts. Surprised it would not be programmed to recognize something that's part of jQuery, do you use a nonstandard name for the include?)

Alternatively, is there any chance that the header for your jQuery include is not correct? Maybe it's being served with an HTML mime type, which most browsers would probably not care about since they type is also set by the script include, but maybe a bot would decide to parse.

In any event rather than setting a redirect, why don't you just use robots.txt? Add this line:

Disallow: /a

You could also try fixing jQuery. Obfuscating the link a little bit would probably do the trick, e.g. change the offending line:

  div.innerHTML = "   <link/><table></table><"+"a hr"+"ef='/a'"
  +" style='color:red;float:left;opacity:.55;'>a</a><input type='checkbox'/>";

If google is smart enough to actually parse string concatenations, which would shock me, you could go one further and assign something like "href" to a variable and then concatenate with that. I can't believe their js scanner would go that far, that would be basically like trying to run it.

这篇关于jQuery在/目录上的网站管理员工具中导致404错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆