检测网页抓取的方式 [英] The way to detect web scraping

查看：179 发布时间：2015/11/30 20:16:40 algorithm security screen-scraping detection

本文介绍了检测网页抓取的方式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要检测的信息刮在我的网站。我想检测的基础上的行为模式，它似乎是有希望的，虽然比较重的计算。

I need to detect scraping of info on my website. I tried detection based on behavior patterns, and it seems to be promising, although relatively computing heavy.

该基地是收集特定的客户端的请求，时间戳和他们的行为模式与普通模式或precomputed模式进行比较。

The base is to collect request timestamps of certain client side and compare their behavior pattern with common pattern or precomputed pattern.

要更precise，我收集请求到阵列之间的时间间隔，通过时间的函数索引：

To be more precise, I collect time intervals between requests into array, indexed by function of time:

i = (integer) ln(interval + 1) / ln(N + 1) * N + 1
Y[i]++
X[i]++ for current client

，其中N是时间（数）的限制，间隔大于N被丢弃。最初的X和Y都充满了的。

where N is time (count) limit, intervals greater than N are dropped. Initially X and Y are filled with ones.

然后，当我在X和Y凑够一定数量他们，是时候做出决定。标准是参数C：

Then, after I got enough number of them in X and Y, it's time to make decision. Criteria is parameter C:

C = sqrt(summ((X[i]/norm(X) - Y[i]/norm(Y))^2)/k)

，其中X为特定的客户端的数据中，Y是共同的数据，和范数（）是校准功能，并且k是归一化系数，根据规范的类型（）。有3种类型：

where X is certain client data, Y is common data, and norm() is calibration function, and k is normalization coefficient, depending on type of norm(). There are 3 types:

规范（X）= SUMM（X）/数（X），K = 2
规范（X）=开方（SUMM（X [I] ^ 2），K = 2
规范（X）= MAX（X [I]），k是一个非空的元素X数字的平方根

norm(X) = summ(X)/count(X), k = 2
norm(X) = sqrt(summ(X[i]^2), k = 2
norm(X) = max(X[i]), k is square root of number of non-empty elements X

C为范围（0..1），0表示没有行为偏差，1是最大偏差。

C is in range (0..1), 0 means there is no behavior deviation and 1 is max deviation.

1型Сalibration是最好的重复请求，类型2的重复与几个区间，3型非恒定的请求的时间间隔要求。

Сalibration of type 1 is best for repeating requests, type 2 for repeating request with few intervals, type 3 for non-constant request intervals.

你怎么看？我将AP preciate如果你试试这个在您的服务。

What do you think? I'll appreciate if you'll try this on your services.

检测网页抓取的方式 [英] The way to detect web scraping

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

检测网页抓取的方式 [英] The way to detect web scraping

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭