阻止搜索引擎爬网目录 [英] block search engine crawling directory

查看：94 发布时间：2020/5/30 18:39:47 .htaccess robots.txt

本文介绍了阻止搜索引擎爬网目录的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的网站具有这样的URL（最长）结构

My site have URL (longest) structure like this

http://www.example.com/xyz-pqr/abcd-efgh/123.html

因此，最多可以有3个目录级别，但是由于CMS和其他问题，我的网站在搜索引擎中的索引超过了3个目录级别的url，例如，

So there is maximum of 3 directory level, but because of CMS and other problem my site is getting indexed in search engine for more than 3 directory level url like,

http://www.example.com/xyz-pqr/abcd-efgh/xyz-pqr/abcd-efgh/123.html
http://www.example.com/xyz-pqr/abcd-efgh/xyz-pqr/abcd-efgh/abcd-efgh/123.html

我想在robots.txt中编写代码，以便搜索引擎永远不会抓取超过3个目录级别。我该怎么做呢？在此先感谢...

I want to write code in robots.txt so that, search engine will never crawl more than 3 directory level. How do I do this? Thanks in advance...

推荐答案

我不是某些，但我认为以下内容应工作：

I'm not certain, but I think the following should work:

User-agent: *
Disallow: /*/*/*/

因此，鉴于以下两个URL：

So, given these two URLs:

http://www.example.com/xyz-pqr/abcd-efgh/123.html
http://www.example.com/xyz-pqr/abcd-efgh/foo-bar/123.html

第一个将被接受，因为它只有两个目录段（ / xyz-pqr-abcd-efgh ）。

The first would be accepted because it has only two directory segments (/xyz-pqr-abcd-efgh).

第二个将被阻止，因为它具有三个目录段。

The second would be blocked because it has three directory segments.

更长的时间也将被阻止。

And anything longer would be blocked, as well.

这篇关于阻止搜索引擎爬网目录的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

阻止搜索引擎爬网目录 [英] block search engine crawling directory

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

阻止搜索引擎爬网目录 [英] block search engine crawling directory

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭