Redis 的内存使用量是数据的 10 倍 [英] Redis 10x more memory usage than data

查看:15
本文介绍了Redis 的内存使用量是数据的 10 倍的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在 redis 中存储一个词表.表演很棒.

I am trying to store a wordlist in redis. The performance is great.

我的方法是制作一个名为words"的集合.并通过sadd"添加每个新词.

My approach is of making a set called "words" and adding each new word via 'sadd'.

当添加一个 15.9 MB 并且包含大约一百万个单词的文件时,redis-server 进程消耗 160 MB 的内存.为什么我使用了 10 倍的内存,有没有更好的方法来解决这个问题?

When adding a file thats 15.9 MB and contains about a million words, the redis-server process consumes 160 MB of ram. How come I am using 10x the memory, is there any better way of approaching this problem?

推荐答案

这是任何高效数据存储的预期:必须在内存中以指针链接的单元格的动态数据结构对单词进行索引.结构元数据、指针和内存分配器内部碎片的大小是数据比相应平面文件占用更多内存的原因.

Well this is expected of any efficient data storage: the words have to be indexed in memory in a dynamic data structure of cells linked by pointers. Size of the structure metadata, pointers and memory allocator internal fragmentation is the reason why the data take much more memory than a corresponding flat file.

Redis 集实现为哈希表.这包括:

A Redis set is implemented as a hash table. This includes:

  • 几何增长的指针数组(2 的幂)
  • 当增量重新散列处于活动状态时可能需要第二个数组
  • 表示哈希表中条目的单链表单元(3 个指针,每个条目 24 字节)
  • Redis 对象包装器(每个值一个)(每个条目 16 个字节)
  • 实际数据本身(每个数据都以 8 个字节作为大小和容量的前缀)

以上所有大小都是针对 64 位实现给出的.考虑到内存分配器的开销,对于使用 jemalloc 分配器 (>= 2.4) 的最新版本的 Redis,它导致 Redis 每个设置项至少占用 64 字节(在数据之上)

All the above sizes are given for the 64 bits implementation. Accounting for the memory allocator overhead, it results in Redis taking at least 64 bytes per set item (on top of the data) for a recent version of Redis using the jemalloc allocator (>= 2.4)

Redis 为某些数据类型提供内存优化,但它们不包括字符串集.如果你真的需要优化集合的内存消耗,你可以使用一些技巧.我不会只为 160 MB 的 RAM 执行此操作,但如果您有更大的数据,您可以这样做.

Redis provides memory optimizations for some data types, but they do not cover sets of strings. If you really need to optimize memory consumption of sets, there are tricks you can use though. I would not do this for just 160 MB of RAM, but should you have larger data, here is what you can do.

如果你不需要集合的并集、交集、差集功能,那么你可以将你的单词存储在哈希对象中.好处是如果哈希对象足够小,Redis 可以使用 zipmap 自动优化它们.在 Redis >= 2.6 中,zipmap 机制已被 ziplist 取代,但思路是一样的:使用可以适应 CPU 缓存的序列化数据结构,以获得性能和紧凑的内存占用.

If you do not need the union, intersection, difference capabilities of sets, then you may store your words in hash objects. The benefit is hash objects can be optimized automatically by Redis using zipmap if they are small enough. The zipmap mechanism has been replaced by ziplist in Redis >= 2.6, but the idea is the same: using a serialized data structure which can fit in the CPU caches to get both performance and a compact memory footprint.

为了保证散列对象足够小,可以根据某种散列机制来分发数据.假设您需要存储1M个项目,添加一个词可以通过以下方式实现:

To guarantee the hash objects are small enough, the data could be distributed according to some hashing mechanism. Assuming you need to store 1M items, adding a word could be implemented in the following way:

  • 以 10000 为模进行哈希运算(在客户端完成)
  • HMSET 词:[hashnum] [词] 1

而不是存储:

words => set{ hi, hello, greetings, howdy, bonjour, salut, ... }

您可以存储:

words:H1 => map{ hi:1, greetings:1, bonjour:1, ... }
words:H2 => map{ hello:1, howdy:1, salut:1, ... }
...

要检索或检查一个词的存在,它是相同的(散列它并使用 HGET 或 HEXISTS).

To retrieve or check the existence of a word, it is the same (hash it and use HGET or HEXISTS).

如果散列的模为根据 zipmap 配置(或 Redis >= 2.6 的 ziplist)选择:

With this strategy, significant memory saving can be done provided the modulo of the hash is chosen according to the zipmap configuration (or ziplist for Redis >= 2.6):

# Hashes are encoded in a special way (much more memory efficient) when they
# have at max a given number of elements, and the biggest element does not
# exceed a given threshold. You can configure this limits with the following
# configuration directives.
hash-max-zipmap-entries 512
hash-max-zipmap-value 64

注意:这些参数的名称随着 Redis >= 2.6 发生了变化.

Beware: the name of these parameters have changed with Redis >= 2.6.

这里,1M 项的模 10000 表示每个哈希对象有 100 个项,这将保证所有这些项都存储为 zipmaps/ziplists.

Here, modulo 10000 for 1M items means 100 items per hash objects, which will guarantee that all of them are stored as zipmaps/ziplists.

这篇关于Redis 的内存使用量是数据的 10 倍的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆