如何检测机器人编程 [英] How do I detect bots programatically

查看:179
本文介绍了如何检测机器人编程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有我们登录页面点击和漫游访问和游客堵塞我们的数据库的情况。因为这之前,我们甚至要求人力投入,基本上我们正在登录页面点击我们想只有人类登陆页面点击我们无法使用验证码或其他技术类的。

we have a situation where we log visits and visitors on page hits and bots are clogging up our database. We can't use captcha or other techniques like that because this is before we even ask for human input, basically we are logging page hits and we would like to only log page hits by humans.

有没有已知的僵尸IP那里的名单?检查是否知名机器人用户代理工作的?

Is there a list of known bot IP out there? Does checking known bot user-agents work?

推荐答案

有没有安全可靠的方式来捕获所有机器人。如果有人想了一个机器人可以充当就像一个真正的浏览器。

There is no sure-fire way to catch all bots. A bot could act just like a real browser if someone wanted that.

最严重的机器人确定自己明明在代理字符串,因此与已知漫游列表可以fitler了其中的大多数。到列表中,你还可以添加一些HTTP库默认使用,可以捕捉到机器人的人谁不知道如何改变代理字符串一些代理字符串。如果你只是登录游客的代理字符串,你应该能够挑选出的那些列表中的存储。

Most serious bots identify themselves clearly in the agent string, so with a list of known bots you can fitler out most of them. To the list you can also add some agent strings that some HTTP libraries use by default, to catch bots from people who don't even know how to change the agent string. If you just log the agent strings of visitors, you should be able to pick out the ones to store in the list.

您也可以让坏机器人陷阱将导致该公司过滤掉在robots.txt文件中的页面在页面上隐藏链接。严重的机器人不会跟随链接,和人类不能一下就可以了,所以只有不遵循BOT规则要求的文件。

You can also make a "bad bot trap" by putting a hidden link on your page that leads to a page that's filtered out in your robots.txt file. Serious bots would not follow the link, and humans can't click on it, so only bot that doesn't follow the rules request the file.

这篇关于如何检测机器人编程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆