从PHP的View计数器中排除漫游器和爬虫 [英] Exclude bots and spiders from a View counter in PHP

查看:90
本文介绍了从PHP的View计数器中排除漫游器和爬虫的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经用PHP为网站构建了一个非常基本的广告管理器。

I have built a pretty basic advertisement manager for a website in PHP.

我说基本是因为它不像Google或Facebook广告甚至大多数高端广告都复杂服务器。

I say basic because it's not complex like Google or Facebook ads or even most high end ad servers. Doesn't handle payments or anything or even targeting users.

它仅用于显示随机横幅广告,计算展示次数和点击次数,而对我的访问量低的网站有用。

It serves the purpose for my low traffic site though to simply show a random banner ad, count impression views and clicks.

功能


  • 广告位/位置在页面上

  • 横幅图片

  • 名称

  • 查看/展示计数器

  • 点击计数器

  • 开始和结束日期,或者永不结束

  • 禁用/启用广告

  • Ad slot/position on page
  • Banner image
  • Name
  • View/impression counter
  • Click counter
  • Start and end date, or never ending
  • Disable/enable ad

我想逐步为系统添加更多功能。

I am wanting to gradually add more functionality to the system though.

我注意到的一件事是印象数/浏览量

One thing I have noticed is the Impressions/views counter often seems inflated.

我相信造成这种情况的原因是社交网络的蜘蛛和机器人以及搜索引擎蜘蛛。

I believe the cause of this is from Social networks' spiders and bots as well as search engine spiders.

例如,如果有人从我网站的页面上输入URL进入Facebook,Google +,Twitter,LinkedIn,Pinterest和其他网络,则这些网站通常会搜寻我的网站来收集网页标题,图像,和说明。

For example, if someone enters a URL from a page on my website into Facebook, Google+, Twitter, LinkedIn, Pinterest, and other networks, those sites will often spider my site to gather the webpages Title, images, and description.

我真的希望能够将其从广告展示次数/观看次数(实际上是

I would really like to be able to disable this from counting as Advertisement impressions/view counts when an actual human is not viewing the page.

我意识到这很难检测到所有这些,但是如果有一种方法可以获取其中的大多数,至少它将使我的统计数据更加准确。

I realize this will be very hard to detect all these but if there is a way to get a majority of them, at least it will make my stats a little more accurate.

因此,我正在寻求有关如何实现目标的帮助或想法?请不要说使用卡中没有的其他广告系统,谢谢

So I am reaching out for any help or ideas on how to achieve my goal? Please do not say to use another advertisement system, that is not in the cards, thank you

推荐答案

您需要使用JavaScript为广告提供服务。这是避免大多数爬虫的唯一方法。仅浏览器会加载图像,JS和CSS等依赖项。 99%的机器人会避开它们。

You need to serve the ADs with JavaScript. That's the only way to avoid most of the crawlers. Only browsers load dependencies like Images, JS and CSS. 99% of the robots avoid them.

您还可以执行以下操作:

You can also do this:

// basic crawler detection and block script (no legit browser should match this)
if(!empty($_SERVER['HTTP_USER_AGENT']) and preg_match('~(bot|crawl)~i', $_SERVER['HTTP_USER_AGENT'])){
    // this is a crawler and you should not show ads here
}

您将以这种方式获得更好的统计数据。 使用JS投放广告。

You'll have much better stats this way. Use JS for ads.

PS 您还可以尝试在JS和更高版本中设置Cookie 抓取程序可能会通过HTTP通过PHP在PHP中发送Cookie,但是在JS中设置的Cookie会错过99.9%的机会。因为他们需要加载一个JS文件并对其进行解释。那只能由浏览器来完成。

PS: You could also try setting a cookie in JS and later checking for it. Crawlers might get cookies sent in PHP by HTTP but those set in JS, 99.9% chances they'll miss it. Because they need to load a JS file and interpret it. That's only done by browsers.

这篇关于从PHP的View计数器中排除漫游器和爬虫的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆