Mongodb低基数指数 [英] Mongodb low cardinality index

查看:83
本文介绍了Mongodb低基数指数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道从sql背景来看

From sql background I know

索引的基数是其中的唯一值的数量.您的数据库表中可能有10亿行,但是如果这些表中只有8个唯一值,则您的基数非常低.

The cardinality of an index is the number of unique values within it. Your database table may have a billion rows in it, but if it only has 8 unique values among those rows, your cardinality is very low.

低基数索引并不是主要的效率提升.大多数SQL索引都是二进制搜索树(B-Trees).与对表中的每一行进行串行扫描以查找匹配约束相比,B树在对数上减少了必须进行的比较次数.当树的大小较小时,对B树执行搜索的收益非常低.

A low cardinality index is not a major efficiency gain. Most SQL indexes are binary search trees (B-Trees). Versus a serial scan of every row in a table to find matching constraints, a B-Tree logarithmically reduces the number of comparisons that have to be made. The gains from executing a search against a B-Tree are very low when the size of the tree is small.

那么将索引放在布尔字段上吗?还是枚举值字段?在非常多的行中,很少数量的不同值的基数不会产生明显的效率提升.将基数很高的字段保存在数据库索引中,以确保与顺序扫描相比,扫描B树的收益最大.

So putting an index on a Boolean field? Or an enumerated value field? A cardinality of a very small number of distinct values among a very large number of rows will not yield noticeable efficiency gains. Save your database indexes for fields with very high cardinality to ensure the gains from scanning a B-Tree are largest versus sequential scans.

mongodb 如何?我们是否必须在经常过滤的低基数字段上创建索引?例如,具有4个状态的枚举字段

What about mongodb? Must we create index on low cardinality field that often filtered? for instance an enum field with 4 status

推荐答案

是的,MongoDB存在相同的问题,它使用B树进行索引.因此,带有索引的低基数值会出现性能问题.

Yes, MongoDB has the same issue, and it uses B-Trees for indexing. So there will be performance problems with low-cardinality values with an index.

这是一篇很好的文章

https: //www.percona.com/blog/2018/12/19/using-partial-and-sparse-indexes-in-mongodb/

尽管没有简单或受支持的解决方案,但它为特定情况提供了一些选择:

Although there is no easy or supported solution, it gives a few options for specific cases:

  • 您在分布不均的布尔值字段上运行查询,而您主要查找的是不太频繁的值
  • 您的基数字段很低,大部分查询都在寻找值的子集
  • 大多数查询在字段中查找值的有限子集
  • 您没有足够的内存来存储非常大的索引-例如,您从WiredTiger缓存中驱逐了很多页面
  • you run queries on a boolean field with an uneven distribution, and you look mostly for the less frequent value
  • you have a low cardinality field and the majority of the queries look for a subset of the values
  • the majority of the queries look for a limited subset of the values in a field
  • you don’t have enough memory to store very large indexes – for example, you have a lot of page evictions from the WiredTiger cache

这篇关于Mongodb低基数指数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆