按 IP 地址范围匹配的 MySQL 详细记录摘要 - 需要 mySQL Jedi Knight [英] Summary of MySQL detail records matching by IP address ranges - mySQL Jedi Knight required
问题描述
因此,我必须利用 SO 必须提供的最伟大的 mySQL 思想的所有力量.我必须根据每条记录中的 IP 地址汇总详细记录.场景如下:
So, I have to draw upon all the powers of the greatest mySQL minds that SO has to offer. I have to summarize detail records based on the IP address in each record. Here's the scenario:
简而言之,我们有联盟想知道:我联盟中的哪些学校观看了哪些视频多少次"?在 SQL 术语中,它相当于对详细记录进行计数,并按其可能落入的 IP 范围进行分组.
In short, we have consortiums that want to know: "Which schools within my consortium watched which videos how many times"? In SQL terms, it amounts to COUNTing the detail records, grouped by which IP range it might fall into.
- 我们有几个大学联盟 - 每个联盟都有少数不同的学校成员.
- 联盟中的每所学校都使用不同的 IP 范围来访问我们为这些学校提供的视频.
- IP 范围使用通配符指定,因此每所学校都指定了诸如100.200.35.x、100.201.xx、100.202.39.50 等"之类的内容,每所学校的平均范围数为 10 或 15.
- 要汇总的原始文本日志文件已经在数据库中(每个日志条目一行),并且具有访问视频文件的实际 IP 地址.
- 有 100 条数百万的详细记录,因此我完全预计这将是一个运行相当长一段时间的缓慢过程.
- PHP 脚本可以将通配符分解"为代表的各个 IP,但我担心这将是最终答案,可能需要数周时间才能运行.
(为了简单起见,我只参考被访问的视频文件名并计算它的日志条目,但实际上所有细节,如开始/停止/持续时间等.那里,并将最终成为该解决方案的一部分.)
使用 Consortium 记录如下内容:(除日志详细信息外的所有表格设计均开放给建议):
With Consortium records something like this: (All table designs except log details open to suggestion):
| id|consortium |
| 10|Ivy League |
| 20|California |
学校/IP 记录如下:
And School/IP records something like this:
| id|school |consortium_id|
| 101|Harvard |10 |
| 102|Yale |10 |
| 103|UCLA |20 |
| 104|Berkeley |20 |
| id|school_id|ip_range |
| 1| 101 |100.200.x.x |
| 2| 101 |100.201.65.x |
| 3| 101 |100.202.39.50 |
| 4| 101 |100.202.39.51 |
| 5| 101 |100.200.x.x |
| 6| 101 |100.201.65.x |
| 7| 101 |100.202.39.50 |
详细记录如下:
|session |ip_address |filename |
|560554790925|100.202.390.500|history101.mp4 |
|406417611526|43.22.90.5 |newsreel.mp4 |
|650423700223|100.202.39.50 |history101.mp4 |
|650423700223|100.202.50.12 |science101.mp4 |
|513057324209|100.202.39.56 |history101.mp4 |
我喜欢认为我对 mySQL 非常方便,但这个正在扩展它,我希望有人可能提供一个壮观的功能或一组步骤.
I like to think I'm pretty handy with mySQL, but this one is stretching it, and am hoping that there's a spectacular function or set of steps that someone might offer.
推荐答案
使用现有的数据结构,您可以按如下方式进行字符串匹配(但效率不高):
With your existing data structure, you could do string matching as follows (but it's not very efficient):
SELECT schools.school, detail.filename, COUNT(*)
FROM schools
JOIN ipranges ON schools.id = ipranges.school_id
JOIN detail ON detail.ip_address LIKE REPLACE(ipranges.ip_range, 'x', '%')
WHERE schools.consortium_id = ?
GROUP BY schools.school, detail.filename
更好的方法是将您的 IP 范围存储为网络地址和前缀长度:
A better way would be to store your IP ranges as network address and prefix length:
ALTER TABLE ipranges
ADD COLUMN network INT UNSIGNED,
ADD COLUMN prefix TINYINT;
UPDATE ipranges SET
network = INET_ATON(REPLACE(ip_range, 'x', 0)),
prefix = 32 - 8*(CHAR_LENGTH(ip_range) - CHAR_LENGTH(REPLACE(ip_range,'x',''));
ALTER TABLE ipranges
DROP COLUMN ip_range;
ALTER TABLE detail
ADD COLUMN ip_address_new INT UNSIGNED;
UPDATE detail SET
ip_address_new = INET_ATON(ip_address);
ALTER TABLE detail
DROP COLUMN ip_address,
CHANGE ip_address_new ip_address INT UNSIGNED;
那么这只是进行一些位比较的情况:
Then it would merely be a case of performing some bit comparisons:
SELECT schools.school, detail.filename, COUNT(*)
FROM schools
JOIN ipranges ON schools.id = ipranges.school_id
JOIN detail ON detail.ip_address & ~((1 << 32 - ipranges.prefix) - 1)
= ipranges.network
WHERE schools.consortium_id = ?
GROUP BY schools.school, detail.filename
这篇关于按 IP 地址范围匹配的 MySQL 详细记录摘要 - 需要 mySQL Jedi Knight的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!