使用 Redis 排序集进行索引 [英] Indexing using Redis sorted sets

查看:106
本文介绍了使用 Redis 排序集进行索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想就我正在考虑使用 Redis 排序集实现可搜索索引的两种方法获得一些反馈和建议.

I would like to get some feedback and suggestions regarding two approaches I'm considering to implementing searchable indexes using Redis sorted sets.

情况和目标

我们目前有一些键值表存储在 Cassandra 中,我们希望为其创建索引.例如,一张表将包含人的记录,Cassandra 表将 id 作为其主键,并将序列化对象作为值.该对象将具有诸如 first_name、last_name、last_updated 等字段.

We currently have some key-value tables we're storing in Cassandra, and which we would like to have indexes for. For example, one table would contain records of people, and the Cassandra table would have id as its primary key, and the serialized object as the value. The object would have fields such as first_name, last_name, last_updated, and others.

我们想要的是能够进行诸如 "last_name = 'Smith' AND first_name > 'Joel'" 、 "last_name <'Aaronson'" 、 "last_name = 'Smith' AND first_name = 'Winston' 之类的搜索"等等.搜索应该产生匹配的 id,这样我们就可以从 Cassandra 中检索对象.我认为上述搜索可以使用单个索引完成,按字典顺序按 last_name、first_name 和 last_updated 排序.如果我们需要使用不同的顺序进行一些搜索(例如first_name = 'Zeus'"),我们可以使用类似的索引来允许这些搜索(例如 first_name、last_updated).

What we want is to be able to do searches such as "last_name = 'Smith' AND first_name > 'Joel'" , "last_name < 'Aaronson'" , "last_name = 'Smith' AND first_name = 'Winston'" and so on. The searches should yield the ids of matches so we can then retrieve the objects from Cassandra. I'm thinking the above searches could be done with a single index, sorted lexicographically by last_name, first_name, and last_updated. If we need some searches using a different order (e.g. "first_name = 'Zeus'") we can have a similar index that would allow those (e.g. first_name, last_updated).

我们正在考虑为此使用 Redis,因为我们需要能够处理每分钟的大量写入.我已经阅读了一些使用 Redis 排序集的常见方法,并提出了两种可能的实现:

We are looking at using Redis for this, because we need to be able to handle a large number of writes per minute. I've read up on some common ways Redis sorted sets are used, and come up with two possible implementations:

选项 1:每个索引的单个排序集

对于我们的 last_name, first_name, last_updated 索引,我们将在 Redis 中的键索引下有一个排序集:people:last_name:first_name:last_updated,其中包含格式为 last_name:first_name:last_updated:id 的字符串.例如:

For our index by last_name, first_name, last_updated, we would have a sorted set in Redis under the key indexes:people:last_name:first_name:last_updated , which would contain strings with the format last_name:first_name:last_updated:id . For example:

smith:joel:1372761839.444:0azbjZRHTQ6U8enBw6BJBw

smith:joel:1372761839.444:0azbjZRHTQ6U8enBw6BJBw

(对于分隔符,我可能会使用 '::' 而不是 ':' 或其他可以更好地处理字典顺序的东西,但现在让我们忽略它)

(For the separator I might use '::' rather than ':' or something else to work better with the lexicographic ordering, but let's ignore that for now)

所有项目都将被赋予 0 分,以便排序集将仅按字符串本身按字典顺序排序.如果然后我想做一个像last_name = 'smith' AND first_name <'bob'"这样的查询,我需要获取列表中'smith:bob'之前的所有项目.

The items would all be given score 0 so that the sorted set will just be sorted lexicographically by the strings themselves. If I then want to do a query like "last_name = 'smith' AND first_name < 'bob'", I would need to get all the items in the list that come before 'smith:bob'.

据我所知,这种方法有以下缺点:

As far as I can tell, there are the following drawbacks to this approach:

  1. 没有Redis函数可以根据字符串值选择范围.此功能称为 ZRANGEBYLEX,由 Salvatore Sanfilippo 在 https://github.com/antirez/redis/issues/324 上提出 ,但没有实现,所以我必须使用二进制搜索找到端点并自己获取范围(可能使用 Lua,或者在应用程序级别使用 Python,这是我们用来访问 Redis 的语言).
  2. 如果我们想为索引条目包含一个生存时间,似乎最简单的方法是让一个定期计划的任务遍历整个索引并删除过期的项目.
  1. There is no Redis function to select a range based on the string value. This feature, called ZRANGEBYLEX, has been proposed by Salvatore Sanfilippo at https://github.com/antirez/redis/issues/324 , but is not implemented, so I would have to find the endpoints using binary searches and get the range myself (perhaps using Lua, or at the application-level with Python which is the language we're using to access Redis).
  2. If we want to include a time-to-live for index entries, it seems the simplest way to do it would be having a regularly scheduled task which goes through the whole index and removes expired items.

选项 2:小的排序集,按 last_updated 排序

这种方法是类似的,除了我们会有许多更小的排序集合,每个集合都有一个类似时间的值,例如用于分数的 last_updated.例如,对于相同的 last_name, first_name, last_updated 索引,我们将为每个 last_name, first_name 组合设置一个排序集.例如,键可能是 index:people:last_name=smith:first_name=joel ,对于我们称为 Joel Smith 的每个人,它都有一个条目.每个条目的名称都是 id,它的分数是 last_updated 值.例如:

This approach would be similar, except we would have many, smaller, sorted sets, with each having a time-like value such as last_updated for the scores. For example, for the same last_name, first_name, last_updated index, we would have a sorted set for each last_name, first_name combination. For example, the key might be indexes:people:last_name=smith:first_name=joel , and it would have an entry for each person we have called Joel Smith. Each entry would have as its name the id and its score the last_updated value. E.g.:

值:0azbjZRHTQ6U8enBw6BJBw;得分:1372761839.444

value: 0azbjZRHTQ6U8enBw6BJBw ; score: 1372761839.444

这样做的主要优点是 (a) 我们知道除 last_updated 之外的所有字段的搜索将非常容易,并且 (b) 使用 ZREMRANGEBYSCORE 实现生存时间将非常容易.

The main advantages to this are (a) searches where we know all the fields except last_updated would be very easy, and (b) implementing a time-to-live would be very easy, using the ZREMRANGEBYSCORE.

对我来说似乎很大的缺点是:

The drawback, which seems very large to me is:

  1. 以这种方式管理和搜索似乎要复杂得多.例如,我们需要索引来跟踪它的所有键(例如,我们想在某个时候清理)​​并以分层方式执行此操作.诸如last_name < 'smith'"之类的搜索将需要首先查看所有姓氏的列表以找到 smith 之前的那些名字,然后针对每一个查看它包含的所有名字的人,然后针对每个姓氏那些从其排序集中获取所有项目的人.换句话说,有很多组件需要构建和担心.

总结

所以在我看来第一个选项会更好,尽管它有缺点.我非常感谢关于这两个或其他可能的解决方案的任何反馈(即使他们是我们应该使用 Redis 以外的东西).

So it seems to me the first option would be better, in spite of its drawbacks. I would very much appreciate any feedback regarding these two or other possible solutions (even if they're that we should use something other than Redis).

推荐答案

  1. 我强烈反对为此使用Redis.您将存储大量额外的指针数据,如果您决定要执行更复杂的查询,例如 SELECT WHERE first_name LIKE 'jon%',您将遇到麻烦.如果您想同时搜索两个字段,您还需要设计额外的、非常大的跨多个列的索引.您本质上需要继续黑客攻击并重新设计搜索框架.最好使用 Elastic SearchSolr,或任何其他已经构建的框架来完成您想要做的事情.Redis 很棒,有很多很好的用途.这不是其中之一.

  1. I strongly discourage the use of Redis for this. You'll be storing a ton of extra pointer data, and if you ever decide you want to do more complicated queries like, SELECT WHERE first_name LIKE 'jon%' you're going to run into trouble. You'll also need to engineer extra, very big indexes that cross multiple columns, in case you want to search for two fields at the same time. You'll essentially need to keep hacking away and reengineering a search framework. You'd be much better off using Elastic Search or Solr, or any of the other frameworks already built to do what you're trying to do. Redis is awesome and has lots of good uses. This is not one of them.

抛开警告,回答您的实际问题:我认为最好使用第一个解决方案的变体.每个索引使用单个排序集,但只需将您的字母转换为数字.将您的字母转换为某个十进制值.您可以使用 ASCII 值,或者只是将每个字母按字典顺序分配给一个 1-26 的值,假设您使用的是英语.标准化,以便每个字母占用相同的数字长度(因此,如果 26 是您的最大数字,则 1 将被写入01").然后将它们与前面的小数点一起附加,并将其用作每个索引的分数(即帽子"将是.080120").这将使您在单词和这些数字之间有一个正确排序的 1 对 1 映射.搜索时,将字母转换为数字,然后您就可以使用 Redis 的所有漂亮的排序集函数,例如 ZRANGEBYSCORE,而无需重写它们.Redis 的函数编写得非常非常优化,因此您最好尽可能使用它们,而不是编写自己的函数.

Warning aside, to answer your actual question: I think you'd be best served using a variant of your first solution. Use a single sorted set per index, but just convert your letters to numbers. Convert your letters to some decimal value. You can use the ASCII value, or just assign each letter to a 1-26 value in lexicographic order, assuming you're using English. Standardize, so that each letter takes up the same numeric length (so, if 26 is your biggest number, 1 would be written "01"). Then just append these together with a decimal point in front and use that as your score per index (i.e. "hat" would be ".080120"). This will let you have a properly ordered 1-to-1 mapping between words and these numbers. When you search, convert from letters to numbers, and then you'll be able to use all of Redis' nice sorted set functions like ZRANGEBYSCORE without needing to rewrite them. Redis' functions are written very, very optimally, so you're much better off using them whenever possible instead of writing your own.

这篇关于使用 Redis 排序集进行索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆