Redis 或 Mongo 用于确定数字是否在范围内? [英] Redis or Mongo for determining if a number falls within ranges?

查看:33
本文介绍了Redis 或 Mongo 用于确定数字是否在范围内?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一种方法来快速检查 IP 地址是否属于许多禁止的 IP 范围之一.

I need a way to quickly check if an IP address falls into one of many forbidden IP ranges.

我目前使用 iptables 来检查 IP 是否在指定范围内.这适用于几千个范围,但这个数字即将急剧增加到几十万,并且还会继续增长.

I currently use iptables to check if the IP falls into a specified range. This works fine for several thousand ranges, but that number is about to increase drastically to several hundred thousand, and will continue to grow.

我目前简单地向 iptables 添加新规则的方法的另一个问题是重复数量的增加.

The other issue with my current method of simply adding new rules to iptables is the increasing number of duplicates.

在将 IP 或范围添加到规则集之前,我需要一种有效的方法来检查它是否属于现有(更大的)范围.

I need an efficient method for checking if an IP or range falls into an existing (larger) range before it's added to the ruleset.

Ruby 是我最熟悉的语言,但是对于不断增长的范围数量,哪种数据结构是最佳选择?

Ruby is the language I am most familiar with, but what data structures would be the best options for the ever growing number of ranges?

我想出的一个解决方案是使用 Redis 集合或 MongoDB 将单个 IP 存储为整数,然后简单地检查 IP 是否存在于集合中……但我的直觉告诉我必须有更聪明的方法.

One solution I came up with is to use Redis sets or perhaps MongoDB to store individual IPs as integers, then simply check if the IP exists in the set... but my gut tells me there has to be a smarter way.

如果我要将 IP 转换为整数并存储范围,那么遍历这些范围以查看新 IP 或范围是否已包含在现有更大范围内的最佳方法是什么?

If I were to convert IPs into integers and store the ranges, what would be the optimal way to run through the ranges to see if a new IP or range might already be encompassed by an existing bigger range?

最后一点:速度比内存成本更重要.

Final note: speed is more important than memory costs.

推荐答案

与上一张海报相反,我认为您无法通过使用朴素的索引获得 O(log n) 复杂度.让我们以 mongodb 为例.您可以定义两个索引(用于范围的开始和结束属性),但 mongodb 只会使用一个来解决给定的查询.所以它不会工作.现在,如果您使用涉及范围的开始和结束属性的单个复合索引,则查找要检查的第一个范围的复杂性将是对数的,但是找到与查询匹配的最后一个范围将变得线性.最坏情况的复杂度是 O(n),当所有存储的范围与您的输入重叠时,您就拥有了它.

Contrary to the previous poster, I don't think you can get O(log n) complexity by using naive indexing. Let's consider mongodb for instance. You could define two indexes (for start and end properties of the ranges), but mongodb will only use one to solve a given query. So it will not work. Now if you use a single compound index involving both start and end properties of the ranges, the complexity will be logarithmic to find the first range to check, but then it will get linear to find the last range matching the query. The worst case complexity is O(n), and you have it when all the stored ranges overlap your input.

附带说明一下,如果您知道自己在做什么,则使用 Redis 排序集可以模拟排序索引(复杂度为 O(log n)).Redis 不仅仅是一个简单的键值存储.Redis sorted set是使用skip list实现的,score和value都是用来比较items的.

On a side note, using Redis sorted set you can emulate a sorted index (with O(log n) complexity) if you know what you do. Redis is a bit more than a simple key-value store. Redis sorted sets are implemented using a skip list, and both the score and value are used to compare items.

为了解决这类问题,需要专门的索引结构.你可能想看看:

To solve this kind of problem, a dedicated indexing structure is needed. You may want to have a look at:

http://en.wikipedia.org/wiki/Segment_tree或者http://en.wikipedia.org/wiki/Interval_tree

如果关注的是空间上的速度,那么将索引展平可能会很有趣.例如,让我们考虑以下范围(仅使用整数来简化讨论):

If the concern is speed over space, then it may be interesting to flatten the index. For instance, let's consider the following ranges (using only integers to simplify the discussion):

A 2-8
B 4-6
C 2-9
D 7-10

可以构建索引非重叠段的稀疏结构.

A sparse structure indexing non overlapping segments can be built.

0  []
2  [A C]
4  [A C B]
7  [A C D]
9  [C D]
10 [D]
11 []

每个条目包含一个非重叠段的下限作为键,以及匹配范围的列表或集合作为值.条目应使用已排序的容器(树、跳过列表、btree 等...)进行索引

Each entry contains the lower bound of a non overlapping segment as the key, and the list or set of matching ranges as a value. Entries should be indexed using a sorted container (tree, skip list, btree, etc ...)

为了找到与 5 匹配的范围,我们查找小于或等于 5 的第一个条目(在本例中为 4)并提供范围列表( [A C B] )

To find the ranges matching 5, we look for the first entry which is lower or equal to 5 (it will be 4 in this example) and the list of ranges is provided ( [A C B] )

有了这个数据结构,查询的复杂度真的是O(log n).然而,构建和维护它并非微不足道(且昂贵).mongodb和Redis都可以实现.

With this data structure, the complexity of the queries is really O(log n). However it is not trivial (and expensive) to build and maintain it. It can be implemented with both mongodb and Redis.

这是一个Redis示例:

Here is an example with Redis:

> rpush range:2 2-8 2-9
(integer) 2
> rpush range:4 2-8 2-9 4-6
(integer) 3
> rpush range:7 2-8 2-9 7-10
(integer) 3
> rpush range:9 2-9 7-10
(integer) 2
> rpush range:10 7-10
(integer) 1
> zadd range_index 0 range:0 2 range:2 4 range:4 7 range:7 9 range:9 10 range:10
(integer) 6
> zrevrangebyscore range_index 5 0 LIMIT 0 1
1) "range:4"
> lrange range:4 0 -1
1) "2-8"
2) "2-9"
3) "4-6"

这篇关于Redis 或 Mongo 用于确定数字是否在范围内?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆