如何有效地检查给定的IP地址是否属于Python中的IP子网? [英] How to efficiently check if a given IP Address belong to an IP subnetwork in Python?

查看:222
本文介绍了如何有效地检查给定的IP地址是否属于Python中的IP子网?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组约200,000个IP地址和10,000个子网(1.1.1.1/24)。对于每个IP地址,我需要检查它是否属于这些子网之一,但由于它是一个如此庞大的数据集并且我的计算能力较低,我希望能够有效地实现这一点。

I have a set of about 200,000 IP Addresses and 10,000 subnets of the form(1.1.1.1/24). For every IP Address I need to check whether it belongs to one of these subnets, but since it is a such a large dataset and I have less computational power, I would like an efficient implementation for this.

在搜索时,我找到的一种方法是这样的( https://stackoverflow.com/a/820124/7995937 ):

On searching, one method I found was this (https://stackoverflow.com/a/820124/7995937):

from netaddr import IPNetwork, IPAddress
if IPAddress("192.168.0.1") in IPNetwork("192.168.0.0/24"):
     print "Yay!"

但由于我必须循环这超过200,000个IP地址,并且每个地址循环超过10,000个子网,我不确定这是否有效。
我的第一个疑问是,检查IPNetwork()中的IPAddress()只是一个线性扫描还是以某种方式优化?

But since I have to loop this over 200,000 IP Addresses, and for each address loop over 10,000 subnets, I am unsure if this is efficient. My first doubt, is checking "IPAddress() in IPNetwork()" just a linear scan or is it optimized in some way?

另一个解决方案我想出的是制作一个包含IP子网中包含的所有IP的列表(大约有13,000,000个IP,没有重复),然后对其进行排序。如果我这样做,那么在我的200,000个IP地址的循环中,我只需要通过一组更大的IP地址对每个IP进行二进制搜索。

The other solution I came up with was to make a list with all the IPs contained in the IP Subnets(which comes to about 13,000,000 IPs without duplicates), and then sorting it. If I do this, then in my loop over the 200,000 IP Addresses I only need to do a binary search for each IP, over a larger set of IP Addresses.

for ipMasked in ipsubnets:  # Here ipsubnets is the list of all subnets
        setUnmaskedIPs = [str(ip) for ip in IPNetwork(ipMasked)]
        ip_list = ip_list + setUnmaskedIPs
ip_list = list(set(ip_list))  # To eliminate duplicates
ip_list.sort()

然后我可以按以下方式执行二元搜索:

I could then just perform binary search in the following manner:

for ip in myIPList:  # myIPList is the list of 200,000 IPs
    if bin_search(ip,ip_list):
        print('The ip is present')

这种方法比另一种更有效吗?或者还有其他更有效的方法来执行此任务吗?

Is this method more efficient than the other one? Or is there any other more efficient way to perform this task?

推荐答案

如果N的前导位为该地址匹配其中一个N位子网的N个前导位。所以,首先列出空集。将每个子网编码为32位整数,并将尾随位屏蔽掉。例如,1.2.3.4/23 equals(0x01020304& 0xfffffe00)等于0x01020200。将此数字添加到列表中的第23个集合,即子网[23] 。继续所有子网。

Your IP address in in a subnet if N leading bits of that address match N leading bits of one of the N-bit subnets. So, start by making a list of empty sets. Encode each subnet as a 32-bit integer with the trailing bits masked out. For example, 1.2.3.4/23 equals (0x01020304 & 0xfffffe00) equals 0x01020200. Add this number to the 23rd set in the list, ie subnets[23]. Continue for all the subnets.

要查看您的子网中是否有IP地址,请以与32位数字<$ c $相同的方式对IP地址进行编码c> ipaddr 然后(类似于未经测试的代码)

To see if an IP address is in your subnets, encode the IP address in the same way as a 32-bit number ipaddr and then (something like, untested code)

for N in range( 32, 0, -1)
    mask = ( 0xffffffff >> (32-N) ) << (32-N)
    if (ipaddr & mask) in subnets[N] :
        # have found ipaddr in one of our subnets
        break # or do whatever...
else
    # have not found  ipaddr

在最坏的情况下查找集合中的数字O (log N)其中N在集合中的元素数量。对于不在子网集中的IP地址的最坏情况,此代码最多执行32次。如果预计大多数地址都存在,那么首先要测试具有最多元素的集合进行优化。对于N in(24,16,8,29,23,28,27,26,25,22),这可能是

Looking up a number in a set at worst O(log N) where N in the number of elements in the set. This code does it at most 32 times for the worst case of an ip address that is not in the sets of subnets. If the majority of the addresses are expected to be present, there's an optimisation to test the sets with the most elemnnts first. That might be

for N in ( 24, 16, 8, 29, 23, 28, 27, 26, 25, 22, 15, 21 ... )

或者您可以在运行时计算最佳序列。

or you could calculate the optimal sequence at runtime.

这篇关于如何有效地检查给定的IP地址是否属于Python中的IP子网?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆