如何为子域正确编写 robots.txt 文件? [英] How robots.txt file should be properly written for subdomains?

查看:43
本文介绍了如何为子域正确编写 robots.txt 文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我想让所有爬虫索引根和一些特定的子域,有人可以解释我应该如何编写 robots.txt 文件

Can someone explain me how should i write a robots.txt file if i want that all crawlers index root and some specific subdomains

User-agent: *
Allow: /
Allow: /subdomain1/
Allow: /subdomain2/

这样对吗?我应该把它放在哪里?在根 (public_html) 文件夹中还是在每个子域文件夹中?

Is this right? And where should i put it? In the root (public_html) folder or in each subdomain folder?

推荐答案

无法在单个 robots.txt 文件中为不同的子域指定规则.给定的 robots.txt 文件将仅控制对其请求的子域的抓取.如果您想阻止某些子域并允许其他子域,则需要提供来自不同子域的不同 robots.txt 文件.

There is no way to specify rules for different subdomains within a single robots.txt file. A given robots.txt file will only control crawling of the subdomain it was requested from. If you want to block some subdomains and allow others, then you need to serve different robots.txt files from the different subdomains.

例如,如果您想允许抓取 http://crawlme.example.com/,但您想阻止对 http://nocrawl.example.com/ 的抓取,然后:

For example, if you want to allow crawling of http://crawlme.example.com/, but you want to block crawling of http://nocrawl.example.com/ then:

http://crawlme.example.com/robots.txt 应包含:

# Allow everything:
User-agent: *
Disallow:

http://nocrawl.example.com/robots.txt 应包含:

# Block everything:
User-agent: *
Disallow: /

这篇关于如何为子域正确编写 robots.txt 文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆