Redis的内存使用量是数据的10倍 [英] Redis 10x more memory usage than data

查看:76
本文介绍了Redis的内存使用量是数据的10倍的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个小问题.

我正在尝试将单词表存储在Redis中.表现很棒.

I am trying to store a wordlist in redis. The performance is great.

我的方法是制作一个名为单词"的集合,并通过添加"添加每个新单词.

My approach is of making a set called "words" and adding each new word via 'sadd'.

这是添加一个15.9mb的文件并包含大约一百万个单词时的问题,redis-server进程消耗了160mb的ram.我为什么要使用10倍的内存,有没有更好的方法来解决此问题?

Here's the problem when adding a file thats 15.9mb and contains about a million words the redis-server process consumes 160mb of ram. How come I am using 10x the memory, is there any better way of approaching this problem?

预先感谢

推荐答案

任何有效的数据存储都应该做到这一点:必须在指针链接的单元的动态数据结构中的内存中对单词进行索引.结构元数据,指针和内存分配器内部碎片的大小是导致数据比相应的平面文件占用更多内存的原因.

Well this is expected of any efficient data storage: the words have to be indexed in memory in a dynamic data structure of cells linked by pointers. Size of the structure metadata, pointers and memory allocator internal fragmentation is the reason why the data take much more memory than a corresponding flat file.

Redis集被实现为哈希表.这包括:

A Redis set is implemented as a hash table. This includes:

  • 一系列几何增长的指针(2的幂)
  • 活动增量增量哈希后,可能需要第二个阵列
  • 表示哈希表中条目的单链接列表单元(3个指针,每个条目24个字节)
  • Redis对象包装器(每个值一个)(每个条目16个字节)
  • 实际数据本身(每个数据均以8字节为前缀,以表示大小和容量)

以上所有大小均针对64位实现而给出.考虑到内存分配器的开销,对于使用jemalloc分配器(> = 2.4)的最新版本的Redis,Redis每个设置项(在数据顶部)至少占用64个字节

All the above sizes are given for the 64 bits implementation. Accounting for the memory allocator overhead, it results in Redis taking at least 64 bytes per set item (on top of the data) for a recent version of Redis using the jemalloc allocator (>= 2.4)

Redis为某些数据类型提供了内存优化,但它们不包括字符串集.如果您确实需要优化集合的内存消耗,则可以使用一些技巧.我不会仅使用160 MB的RAM来执行此操作,但是如果您有更大的数据,则可以执行以下操作.

Redis provides memory optimizations for some data types, but they do not cover sets of strings. If you really need to optimize memory consumption of sets, there are tricks you can use though. I would not do this for just 160 MB of RAM, but should you have larger data, here is what you can do.

如果不需要集合的并集,交集,差值功能,则可以将单词存储在哈希对象中.好处是,如果哈希对象足够小,则Redis可以使用zipmap对其进行自动优化. zipmap机制已在Redis> = 2.6中被ziplist取代,但是想法是相同的:使用可以适合CPU缓存的序列化数据结构来获得性能和紧凑的内存占用.

If you do not need the union, intersection, difference capabilities of sets, then you may store your words in hash objects. The benefit is hash objects can be optimized automatically by Redis using zipmap if they are small enough. The zipmap mechanism has been replaced by ziplist in Redis >= 2.6, but the idea is the same: using a serialized data structure which can fit in the CPU caches to get both performance and a compact memory footprint.

为确保哈希对象足够小,可以根据某种哈希机制分配数据.假设您需要存储1M项,则可以通过以下方式实现添加单词:

To guarantee the hash objects are small enough, the data could be distributed according to some hashing mechanism. Assuming you need to store 1M items, adding a word could be implemented in the following way:

  • 以10000为模进行哈希处理(在客户端完成)
  • HMSET单词:[hashnum] [word] 1

代替存储:

words => set{ hi, hello, greetings, howdy, bonjour, salut, ... }

您可以存储:

words:H1 => map{ hi:1, greetings:1, bonjour:1, ... }
words:H2 => map{ hello:1, howdy:1, salut:1, ... }
...

要检索或检查单词是否存在,可以使用相同的方法(将其哈希并使用HGET或HEXISTS).

To retrieve or check the existence of a word, it is the same (hash it and use HGET or HEXISTS).

使用这种策略,只要哈希的模数是 根据zipmap配置(或Redis> = 2.6的ziplist)选择:

With this strategy, significant memory saving can be done provided the modulo of the hash is chosen according to the zipmap configuration (or ziplist for Redis >= 2.6):

# Hashes are encoded in a special way (much more memory efficient) when they
# have at max a given number of elements, and the biggest element does not
# exceed a given threshold. You can configure this limits with the following
# configuration directives.
hash-max-zipmap-entries 512
hash-max-zipmap-value 64

当心:在Redis> = 2.6中,这些参数的名称已更改.

Beware: the name of these parameters have changed with Redis >= 2.6.

在这里,对1M个项目取10000模,意味着每个哈希对象100个项目,这将确保所有这些对象都存储为zipmap/ziplists.

Here, modulo 10000 for 1M items means 100 items per hash objects, which will guarantee that all of them are stored as zipmaps/ziplists.

这篇关于Redis的内存使用量是数据的10倍的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆