Redis 使用的底层数据结构是什么? [英] What are the underlying data structures used for Redis?

查看:24
本文介绍了Redis 使用的底层数据结构是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在一个明确的列表中回答两个问题:

  1. Redis 使用的底层数据结构是什么?
  2. 每种类型的主要优点/缺点/用例是什么?

所以,我读过 Redis 列表实际上是用链表实现的.但对于其他类型,我无法挖掘任何信息.此外,如果有人偶然发现了这个问题,并且没有对修改或访问不同数据结构的利弊进行高级总结,他们将拥有何时最好地使用特定类型 也可以参考.

具体来说,我希望概述所有类型:字符串、列表、集合、zset 和哈希.

哦,到目前为止,我已经看过这些文章,其中包括:

解决方案

我会试着回答你的问题,但我会从一开始看起来很奇怪的事情开始:如果你对 Redis 内部不感兴趣,你 不应该关心数据类型是如何在内部实现的.这是出于一个简单的原因:对于每个 Redis 操作,您都会在文档中找到时间复杂度,如果您有一组操作和时间复杂度,那么您唯一需要的就是有关内存使用情况的一些线索(并且因为我们进行了许多可能因数据而异的优化,获得这些数据的最佳方法是进行一些琐碎的现实世界测试).

但是既然你问了,这里是每个 Redis 数据类型的底层实现.

  • 字符串 是使用 C 动态字符串库实现的,因此我们无需为追加操作中的分配付费(渐近地说).例如,这样我们就有了 O(N) 次追加,而不是二次行为.
  • 列表是通过链表实现的.
  • 集合哈希是通过哈希表实现的.
  • 排序集是通过跳过列表(一种特殊类型的平衡树).

但是当列表、集合和排序集合的项目数量和最大值的大小较小时,会使用不同的、更紧凑的编码.这种编码因类型而异,但其特点是它是一个紧凑的数据块,通常对每个操作强制进行 O(N) 扫描.因为我们只对小对象使用这种格式,所以这不是问题;扫描一个小的 O(N) blob 是缓存遗忘,所以实际上它非常快,当元素太多时,编码会自动切换到本机编码(链表、哈希等)向前).

但您的问题并不仅仅是关于内部结构,您的观点是使用什么类型来完成什么?.

字符串

这是所有类型的基本类型.它是四种类型之一,但也是复杂类型的基本类型,因为 List 是一个字符串列表,一个 Set 是一组字符串,等等.

Redis 字符串在您想要存储 HTML 页面的所有明显场景中都是一个好主意,而且当您想要避免转换已编码的数据时也是如此.因此,例如,如果您有 JSON 或 MessagePack,您可以将对象存储为字符串.在 Redis 2.6 中,您甚至可以使用 Lua 脚本操作这种对象服务器端.

字符串的另一个有趣用法是位图,通常是随机访问字节数组,因为 Redis 导出命令以访问随机范围的字节,甚至单个位.例如检查 这篇优秀的博文:使用 Redis 的 Fast Easy 实时指标.

列表

当您可能只触及列表的极端时,列表是很好的:靠近尾部或靠近头部.列表不太适合对内容进行分页,因为随机访问速度很慢,O(N).因此,列表的良好用途是简单的队列和堆栈,或者使用具有相同源和目标的 RPOPLPUSH 处理循环中的项目以旋转"一圈项目.

当我们只想创建 N 个项目的上限集合时,列表也很好,通常我们只访问顶部或底部项目,或者当 N 很小时.

套装

集合是一个无序的数据集合,所以每次你有一个项目集合时它们都很好,并且以非常快的方式检查集合的存在或大小非常重要.集合的另一个很酷的地方是支持偷看或弹出随机元素(SRANDMEMBER 和 SPOP 命令).

集合也可以很好地表示关系,例如,用户 X 的朋友是什么?"等等.但是我们将看到,用于此类内容的其他良好数据结构是排序集.

集合支持交集、并集等复杂操作,因此当您拥有数据并且想要对该数据执行转换以获得一些输出时,这是一种以计算"方式使用 Redis 的良好数据结构.

小型集以非常有效的方式进行编码.

哈希

哈希是表示对象的完美数据结构,由字段和值组成.散列字段也可以使用 HINCRBY 以原子方式递增.当您拥有诸如用户、博客文章或其他类型的项目之类的对象时,如果您不想使用自己的编码(如 JSON 或类似的编码),那么哈希可能是一种可行的方法.>

但是,请记住,Redis 非常高效地编码小哈希值,您可以要求 Redis 以非常快的方式原子地 GET、SET 或递增单个字段.

散列也可以用来表示链接的数据结构,使用引用.例如检查 lamernews.com 评论的实现.

有序集合

排序集是除列表之外唯一用于维护有序元素的其他数据结构.你可以用排序集做一些很酷的事情.例如,您可以在 Web 应用程序中拥有各种热门事物列表.排名靠前的用户、排名靠前的帖子、排名靠前的任何内容,但单个 Redis 实例每秒将支持大量插入和获取顶部元素的操作.

排序集合,就像常规集合一样,可用于描述关系,但它们也允许您对项目列表进行分页并记住顺序.例如,如果我记得用户 X 的朋友,我可以按照已接受的友谊顺序轻松记住他们.

排序集适用于优先队列.

排序集就像更强大的列表,其中从列表中间插入、删除或获取范围总是很快.但是它们使用更多的内存,并且是 O(log(N)) 数据结构.

结论

我希望我在这篇文章中提供了一些信息,但是从 下载 lamernews 的源代码要好得多http://github.com/antirez/lamernews 并了解它是如何工作的.Lamer News 内部使用了来自 Redis 的许多数据结构,并且有很多关于使用什么来解决给定任务的线索.

抱歉语法错别字,这里是午夜,太累了,无法查看帖子;)

I'm trying to answer two questions in a definitive list:

  1. What are the underlying data structures used for Redis?
  2. And what are the main advantages/disadvantages/use cases for each type?

So, I've read the Redis lists are actually implemented with linked lists. But for other types, I'm not able to dig up any information. Also, if someone were to stumble upon this question and not have a high level summary of the pros and cons of modifying or accessing different data structures, they'd have a complete list of when to best use specific types to reference as well.

Specifically, I'm looking to outline all types: string, list, set, zset and hash.

Oh, I've looked at these article, among others, so far:

解决方案

I'll try to answer your question, but I'll start with something that may look strange at first: if you are not interested in Redis internals you should not care about how data types are implemented internally. This is for a simple reason: for every Redis operation you'll find the time complexity in the documentation and, if you have the set of operations and the time complexity, the only other thing you need is some clue about memory usage (and because we do many optimizations that may vary depending on data, the best way to get these latter figures are doing a few trivial real world tests).

But since you asked, here is the underlying implementation of every Redis data type.

  • Strings are implemented using a C dynamic string library so that we don't pay (asymptotically speaking) for allocations in append operations. This way we have O(N) appends, for instance, instead of having quadratic behavior.
  • Lists are implemented with linked lists.
  • Sets and Hashes are implemented with hash tables.
  • Sorted sets are implemented with skip lists (a peculiar type of balanced trees).

But when lists, sets, and sorted sets are small in number of items and size of the largest values, a different, much more compact encoding is used. This encoding differs for different types, but has the feature that it is a compact blob of data that often forces an O(N) scan for every operation. Since we use this format only for small objects this is not an issue; scanning a small O(N) blob is cache oblivious so practically speaking it is very fast, and when there are too many elements the encoding is automatically switched to the native encoding (linked list, hash, and so forth).

But your question was not really just about internals, your point was What type to use to accomplish what?.

Strings

This is the base type of all the types. It's one of the four types but is also the base type of the complex types, because a List is a list of strings, a Set is a set of strings, and so forth.

A Redis string is a good idea in all the obvious scenarios where you want to store an HTML page, but also when you want to avoid converting your already encoded data. So for instance, if you have JSON or MessagePack you may just store objects as strings. In Redis 2.6 you can even manipulate this kind of object server side using Lua scripts.

Another interesting usage of strings is bitmaps, and in general random access arrays of bytes, since Redis exports commands to access random ranges of bytes, or even single bits. For instance check this good blog post: Fast Easy real time metrics using Redis.

Lists

Lists are good when you are likely to touch only the extremes of the list: near tail, or near head. Lists are not very good to paginate stuff, because random access is slow, O(N). So good uses of lists are plain queues and stacks, or processing items in a loop using RPOPLPUSH with same source and destination to "rotate" a ring of items.

Lists are also good when we want just to create a capped collection of N items where usually we access just the top or bottom items, or when N is small.

Sets

Sets are an unordered data collection, so they are good every time you have a collection of items and it is very important to check for existence or size of the collection in a very fast way. Another cool thing about sets is support for peeking or popping random elements (SRANDMEMBER and SPOP commands).

Sets are also good to represent relations, e.g., "What are friends of user X?" and so forth. But other good data structures for this kind of stuff are sorted sets as we'll see.

Sets support complex operations like intersections, unions, and so forth, so this is a good data structure for using Redis in a "computational" manner, when you have data and you want to perform transformations on that data to obtain some output.

Small sets are encoded in a very efficient way.

Hashes

Hashes are the perfect data structure to represent objects, composed of fields and values. Fields of hashes can also be atomically incremented using HINCRBY. When you have objects such as users, blog posts, or some other kind of item, hashes are likely the way to go if you don't want to use your own encoding like JSON or similar.

However, keep in mind that small hashes are encoded very efficiently by Redis, and you can ask Redis to atomically GET, SET or increment individual fields in a very fast fashion.

Hashes can also be used to represent linked data structures, using references. For instance check the lamernews.com implementation of comments.

Sorted Sets

Sorted sets are the only other data structures, besides lists, to maintain ordered elements. You can do a number of cool stuff with sorted sets. For instance, you can have all kinds of Top Something lists in your web application. Top users by score, top posts by pageviews, top whatever, but a single Redis instance will support tons of insertion and get-top-elements operations per second.

Sorted sets, like regular sets, can be used to describe relations, but they also allow you to paginate the list of items and to remember the ordering. For instance, if I remember friends of user X with a sorted set I can easily remember them in order of accepted friendship.

Sorted sets are good for priority queues.

Sorted sets are like more powerful lists where inserting, removing, or getting ranges from the the middle of the list is always fast. But they use more memory, and are O(log(N)) data structures.

Conclusion

I hope that I provided some info in this post, but it is far better to download the source code of lamernews from http://github.com/antirez/lamernews and understand how it works. Many data structures from Redis are used inside Lamer News, and there are many clues about what to use to solve a given task.

Sorry for grammar typos, it's midnight here and too tired to review the post ;)

这篇关于Redis 使用的底层数据结构是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆