robots.txt允许除少数子目录外的所有子目录 [英] robots.txt allow all except few sub-directories
问题描述
除了少数子目录外,我希望在搜索引擎中为我的网站建立索引。以下是我的 robots.txt
设置:
I want my site to be indexed in search engines except few sub-directories. Following are my robots.txt
settings:
robots.txt
在根目录中
User-agent: *
Allow: /
在子目录中将 robots.txt
分开(不包括在内)
Separate robots.txt
in the sub-directory (to be excluded)
User-agent: *
Disallow: /
是正确的方法还是根目录规则将覆盖子目录规则?
Is it the correct way or the root directory rule will override the sub-directory rule?
推荐答案
否,这是错误的。
子目录中不能包含robots.txt。您的robots.txt 必须放置在主机的文档根目录中。
You can’t have a robots.txt in a sub-directory. Your robots.txt must be placed in the document root of your host.
如果要禁止对路径以 / foo
开头的URL进行爬网,请在robots.txt( http:/ /example.com/robots.txt
):
If you want to disallow crawling of URLs whose paths begin with /foo
, use this record in your robots.txt (http://example.com/robots.txt
):
User-agent: *
Disallow: /foo
这可以抓取所有内容(因此无需允许
),但类似
This allows crawling everything (so there is no need for Allow
) except URLs like
-
http://example.com/foo
-
http://example.com/foo/
-
http://example.com/foo.html
-
http://example.com/ foobar
-
http://example.com/foo/bar
- …
http://example.com/foo
http://example.com/foo/
http://example.com/foo.html
http://example.com/foobar
http://example.com/foo/bar
- …
这篇关于robots.txt允许除少数子目录外的所有子目录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!