Mongodb低基数指数 [英] Mongodb low cardinality index
问题描述
我知道从sql背景来看
From sql background I know
索引的基数是其中的唯一值的数量.您的数据库表中可能有10亿行,但是如果这些表中只有8个唯一值,则您的基数非常低.
The cardinality of an index is the number of unique values within it. Your database table may have a billion rows in it, but if it only has 8 unique values among those rows, your cardinality is very low.
低基数索引并不是主要的效率提升.大多数SQL索引都是二进制搜索树(B-Trees).与对表中的每一行进行串行扫描以查找匹配约束相比,B树在对数上减少了必须进行的比较次数.当树的大小较小时,对B树执行搜索的收益非常低.
A low cardinality index is not a major efficiency gain. Most SQL indexes are binary search trees (B-Trees). Versus a serial scan of every row in a table to find matching constraints, a B-Tree logarithmically reduces the number of comparisons that have to be made. The gains from executing a search against a B-Tree are very low when the size of the tree is small.
那么将索引放在布尔字段上吗?还是枚举值字段?在非常多的行中,很少数量的不同值的基数不会产生明显的效率提升.将基数很高的字段保存在数据库索引中,以确保与顺序扫描相比,扫描B树的收益最大.
So putting an index on a Boolean field? Or an enumerated value field? A cardinality of a very small number of distinct values among a very large number of rows will not yield noticeable efficiency gains. Save your database indexes for fields with very high cardinality to ensure the gains from scanning a B-Tree are largest versus sequential scans.
mongodb 如何?我们是否必须在经常过滤的低基数字段上创建索引?例如,具有4个状态的枚举字段
What about mongodb? Must we create index on low cardinality field that often filtered? for instance an enum field with 4 status
推荐答案
是的,MongoDB存在相同的问题,它使用B树进行索引.因此,带有索引的低基数值会出现性能问题.
Yes, MongoDB has the same issue, and it uses B-Trees for indexing. So there will be performance problems with low-cardinality values with an index.
这是一篇很好的文章
https: //www.percona.com/blog/2018/12/19/using-partial-and-sparse-indexes-in-mongodb/
尽管没有简单或受支持的解决方案,但它为特定情况提供了一些选择:
Although there is no easy or supported solution, it gives a few options for specific cases:
- 您在分布不均的布尔值字段上运行查询,而您主要查找的是不太频繁的值
- 您的基数字段很低,大部分查询都在寻找值的子集
- 大多数查询在字段中查找值的有限子集
- 您没有足够的内存来存储非常大的索引-例如,您从WiredTiger缓存中驱逐了很多页面
- you run queries on a boolean field with an uneven distribution, and you look mostly for the less frequent value
- you have a low cardinality field and the majority of the queries look for a subset of the values
- the majority of the queries look for a limited subset of the values in a field
- you don’t have enough memory to store very large indexes – for example, you have a lot of page evictions from the WiredTiger cache
这篇关于Mongodb低基数指数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!