阻止搜索引擎爬网目录 [英] block search engine crawling directory
问题描述
我的网站具有这样的URL(最长)结构
My site have URL (longest) structure like this
http://www.example.com/xyz-pqr/abcd-efgh/123.html
因此,最多可以有3个目录级别,但是由于CMS和其他问题,我的网站在搜索引擎中的索引超过了3个目录级别的url,例如,
So there is maximum of 3 directory level, but because of CMS and other problem my site is getting indexed in search engine for more than 3 directory level url like,
http://www.example.com/xyz-pqr/abcd-efgh/xyz-pqr/abcd-efgh/123.html
http://www.example.com/xyz-pqr/abcd-efgh/xyz-pqr/abcd-efgh/abcd-efgh/123.html
我想在robots.txt中编写代码,以便搜索引擎永远不会抓取超过3个目录级别。我该怎么做呢?在此先感谢...
I want to write code in robots.txt so that, search engine will never crawl more than 3 directory level. How do I do this? Thanks in advance...
推荐答案
我不是某些,但我认为以下内容应工作:
I'm not certain, but I think the following should work:
User-agent: *
Disallow: /*/*/*/
因此,鉴于以下两个URL:
So, given these two URLs:
http://www.example.com/xyz-pqr/abcd-efgh/123.html
http://www.example.com/xyz-pqr/abcd-efgh/foo-bar/123.html
第一个将被接受,因为它只有两个目录段( / xyz-pqr-abcd-efgh
)。
The first would be accepted because it has only two directory segments (/xyz-pqr-abcd-efgh
).
第二个将被阻止,因为它具有三个目录段。
The second would be blocked because it has three directory segments.
更长的时间也将被阻止。
And anything longer would be blocked, as well.
这篇关于阻止搜索引擎爬网目录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!