我可以阻止搜索爬虫对于Apache Web服务器上的每个网站? [英] Can I block search crawlers for every site on an Apache web server?
问题描述
我有几个网站制作code的公共互联网运行份临时服务器的一些。我真的不喜欢它,如果升级网站获得索引。
I have somewhat of a staging server on the public internet running copies of the production code for a few websites. I'd really not like it if the staging sites get indexed.
有没有一种方法,我可以修改临时服务器上我的httpd.conf阻止搜索引擎抓取?
Is there a way I can modify my httpd.conf on the staging server to block search engine crawlers?
更改的robots.txt就没有真正的工作,因为我使用脚本来在同一code碱基复制到这两个服务器。另外,我宁愿不改变虚拟主机conf文件或者作为有一堆网站,我不希望要记住超过一定的设置来复制,如果我做一个新的网站。
Changing the robots.txt wouldn't really work since I use scripts to copy the same code base to both servers. Also, I would rather not change the virtual host conf files either as there is a bunch of sites and I don't want to have to remember to copy over a certain setting if I make a new site.
推荐答案
使用以下内容创建一个robots.txt文件:
Create a robots.txt file with the following contents:
User-agent: *
Disallow: /
把该文件的某处临时服务器上;你的根目录是它(例如 /var/www/html/robots.txt
)。
以下添加到您的httpd.conf文件:
Add the following to your httpd.conf file:
# Exclude all robots
<Location "/robots.txt">
SetHandler None
</Location>
Alias /robots.txt /path/to/robots.txt
大概是不要求 SetHandler
指令,但它可能如果你使用像一个mod_python的处理程序,例如需要。
The SetHandler
directive is probably not required, but it might be needed if you're using a handler like mod_python, for example.
这robots.txt文件现在将提供您的服务器上的所有虚拟主机,覆盖你可能有单独的主机任何robots.txt文件。
That robots.txt file will now be served for all virtual hosts on your server, overriding any robots.txt file you might have for individual hosts.
(注:我的答案基本上是ceejayoz的回答是建议你做同样的事情,但我不得不花一些额外的时间找出所有的细节,以得到它的工作,我决定在这里把这个答案的其他人在这个问题谁可能绊倒的缘故。)
(Note: My answer is essentially the same thing that ceejayoz's answer is suggesting you do, but I had to spend a few extra minutes figuring out all the specifics to get it to work. I decided to put this answer here for the sake of others who might stumble upon this question.)
这篇关于我可以阻止搜索爬虫对于Apache Web服务器上的每个网站?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!