我可以阻止搜索爬虫对于Apache Web服务器上的每个网站? [英] Can I block search crawlers for every site on an Apache web server?

查看:168
本文介绍了我可以阻止搜索爬虫对于Apache Web服务器上的每个网站?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几个网站制作code的公共互联网运行份临时服务器的一些。我真的不喜欢它,如果升级网站获得索引​​。

I have somewhat of a staging server on the public internet running copies of the production code for a few websites. I'd really not like it if the staging sites get indexed.

有没有一种方法,我可以修改临时服务器上我的httpd.conf阻止搜索引擎抓取?

Is there a way I can modify my httpd.conf on the staging server to block search engine crawlers?

更改的robots.txt就没有真正的工作,因为我使用脚本来在同一code碱基复制到这两个服务器。另外,我宁愿不改变虚拟主机conf文件或者作为有一堆网站,我不希望要记住超过一定的设置来复制,如果我做一个新的网站。

Changing the robots.txt wouldn't really work since I use scripts to copy the same code base to both servers. Also, I would rather not change the virtual host conf files either as there is a bunch of sites and I don't want to have to remember to copy over a certain setting if I make a new site.

推荐答案

使用以下内容创建一个robots.txt文件:

Create a robots.txt file with the following contents:

User-agent: *
Disallow: /

把该文件的某处临时服务器上;你的根目录是它(例如 /var/www/html/robots.txt )。

以下添加到您的httpd.conf文件:

Add the following to your httpd.conf file:

# Exclude all robots
<Location "/robots.txt">
    SetHandler None
</Location>
Alias /robots.txt /path/to/robots.txt

大概是不要求​​ SetHandler 指令,但它可能如果你使用像一个mod_python的处理程序,例如需要。

The SetHandler directive is probably not required, but it might be needed if you're using a handler like mod_python, for example.

这robots.txt文件现在将提供您的服务器上的所有虚拟主机,覆盖你可能有单独的主机任何robots.txt文件。

That robots.txt file will now be served for all virtual hosts on your server, overriding any robots.txt file you might have for individual hosts.

(注:我的答案基本上是ceejayoz的回答是建议你做同样的事情,但我不得不花一些额外的时间找出所有的细节,以得到它的工作,我决定在这里把这个答案的其他人在这个问题谁可能绊倒的缘故。)

(Note: My answer is essentially the same thing that ceejayoz's answer is suggesting you do, but I had to spend a few extra minutes figuring out all the specifics to get it to work. I decided to put this answer here for the sake of others who might stumble upon this question.)

这篇关于我可以阻止搜索爬虫对于Apache Web服务器上的每个网站?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆