在范围表中高效查找 [英] Efficient lookup in a range table

查看:48
本文介绍了在范围表中高效查找的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含组织名称的 1.6M IP 范围表.IP 地址被转换为整数.表格形式为:

I have a table of 1.6M IP ranges with organization names. The IP addresses are converted to integers. The table is in the form of:

我有一个需要转换为组织名称的 2000 个唯一 IP 地址(例如 321223、531223....)的列表.

I have a list of 2000 unique ip addresses (e.g. 321223, 531223, ....) that need to be translated to an organization name.

我将转换表作为 mysql 表加载,并在 IP_fromIP_to 上有索引.我遍历了 2000 个 IP 地址,每个 IP 地址运行一个查询,15 分钟后报告仍在运行.我正在使用的查询是

I loaded the translation table as a mysql table with an index on IP_from and IP_to. I looped through the 2000 IP addresses, running one query per ip address, and after 15 minutes the report was still running. The query I'm using is

select organization from iptable where ip_addr BETWEEN ip_start AND ip_end

有没有更有效的方法来进行批量查找?如果这是一个好的解决方案,我会用我的手指.如果有人有特定于 Ruby 的解决方案,我想提一下我使用的是 Ruby.

Is there a more efficient way to do this batch look-up? I'll use my fingers if it's a good solution. And in case someone has a Ruby-specific solution, I want to mention that I'm using Ruby.

推荐答案

鉴于您已经在 ip_start 上有一个索引,这是如何使用它的最佳方式,假设您要进行一次访问每个 IP(在此示例中为 1234):

Given that you already have an index on ip_start, this is how to use it best, assuming that you want to make one access per IP (1234 in this example):

select organization from (
    select ip_end, organization
    from iptable
    where ip_start <= 1234
    order by ip_start desc
    limit 1
) subqry where 1234 <= ip_end

这将使用您的索引开始扫描,由于 limit 1 立即停止.成本应该只比简单的索引访问略高.当然,这种技术依赖于 ip_startip_end 定义的范围从不重叠的事实.

This will use your index to start a scan which stops immediately because of the limit 1. The cost should only be marginally higher than the one of a simple indexed access. Of course, this technique relies on the fact that the ranges defined by ip_start and ip_end never overlap.

您的原始方法的问题在于,mysql 不知道此约束,只能使用索引来确定从哪里开始或停止(它认为)它需要的扫描,以便为您的查询找到所有匹配项.

The problem with your original approach is that mysql, being unaware of this constraint, can only use the index to determine where to start or stop the scan that (it thinks) it needs in order to find all matches for your query.

这篇关于在范围表中高效查找的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆