哈希索引和升序索引之间的Mongodb性能差异(是否有理由不在无序字段中使用哈希?) [英] Mongodb performance difference between Hash and Ascending indices (Any reason not to use hash in a not ordered field?)

查看:583
本文介绍了哈希索引和升序索引之间的Mongodb性能差异(是否有理由不在无序字段中使用哈希?)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在mongodb中,有多种类型的索引.对于这个问题,我对上升(或下降)索引感兴趣,用于排序,哈希索引(根据文档,哈希索引主要用于分片集群以用于支持哈希分片键"(),以确保更均匀地分配数据"()

In mongodb there are multiple types of index. For this question I'm interested in the ascending (or descending) index which can be used for sorting and the hash index which according to the documentation is "primarily used with sharded clusters to support hashed shard keys" (source) ensuring "a more even distribution of data"(source)

我知道您无法创建像db.test.ensureIndex( { "key": "hashed", "sortOrder": 1 } )这样的索引,因为您遇到错误

I know that you can't create an index like: db.test.ensureIndex( { "key": "hashed", "sortOrder": 1 } ) because you get an error

{
    "createdCollectionAutomatically" : true,
    "numIndexesBefore" : 1,
    "errmsg" : "exception: Currently only single field hashed index supported.",
    "code" : 16763,
    "ok" : 0
}

我的问题:

在索引之间:

  1. db.test.ensureIndex( { "key": 1 } )

db.test.ensureIndex( { "key": "hashed" } )

对于查询db.products.find( { key: "a" } ),哪个是性能更高的?是hashedO(1)

For the query db.products.find( { key: "a" } ), which one is more performant?, is the hashed key O(1)

我如何解决这个问题:

在我知道您无法使用hashed使用多键索引之前,我创建了一个格式为db.test.ensureIndex( { "key": 1, "sortOrder": 1 } )的索引,并且在创建索引时,我想知道散列索引是否比升序索引更有效(哈希)通常是O(1)).我保留了现在的键,因为(如上所述)不允许使用db.test.ensureIndex( { "key": "hashed", "sortOrder": 1 } ).但是问题在于,哈希索引的搜索速度比我想念中的要快.

Before I knew that you could not have multi-key indices with hashed, I created an index of the form db.test.ensureIndex( { "key": 1, "sortOrder": 1 } ), and while creating it I wondered if the hashed index was more performant than the ascending one (hash usually is O(1)). I left the key as it is now because (as I mentioned above) db.test.ensureIndex( { "key": "hashed", "sortOrder": 1 } ) was not allowed. But the question of is the hashed index faster for searches by a key stayed in my mind.

我制作索引的情况是:

我有一个集合,其中包含按键分类的文档的排序列表.

I had a collection that contained a sorted list of documents classified by keys.

例如 {key: a, sortOrder: 1, ...}{key: a, sortOrder: 2, ...}{key: a, sortOrder: 3, ...}{key: b, sortOrder: 1, ...}{key: b, sortOrder: 2, ...},...

e.g. {key: a, sortOrder: 1, ...}, {key: a, sortOrder: 2, ...}, {key: a, sortOrder: 3, ...}, {key: b, sortOrder: 1, ...}, {key: b, sortOrder: 2, ...}, ...

由于我使用key进行分类并使用了sortOrder进行分页,所以我总是使用key的一个值查询过滤,并使用sortOrder的文档顺序查询过滤.

Since I used the key to classify and the sortOrder for pagination, I always queried filtering with one value for the key and using the sortOrder for the order of the documents.

这意味着我有两个可能的查询:

That means that I had two possible queries:

  • 对于第一页db.products.find( { key: "a" } ).limit(10).sort({"sortOrder", 1})
  • 对于其他页面db.products.find( { key: "a" , sortOrder: { $gt: 10 } } ).limit(10).sort({"sortOrder", 1})
  • For the first page db.products.find( { key: "a" } ).limit(10).sort({"sortOrder", 1})
  • And for the other pages db.products.find( { key: "a" , sortOrder: { $gt: 10 } } ).limit(10).sort({"sortOrder", 1})

在这种特定情况下,使用O(1)搜索键,使用O(log(n))搜索sortOrder是理想的选择,但这是不允许的.

In this specific scenario, searching with O(1) for the key and O(log(n)) for the sortOrder would have been ideal, but that wasn't allowed.

推荐答案

对于查询db.products.find( { key: "a" } ),哪个是性能更高的?

For the query db.products.find( { key: "a" } ), which one is more performant?

鉴于在两种情况下都为字段key编制了索引,复杂性索引搜索本身将非常相似.由于a的值应为哈希,并且存储在索引树中.

Given that field key is indexed in both cases, the complexity index search itself would be very similar. As the value of a would be hashed, and stored in the index tree.

如果我们要寻找总体性能成本,则在匹配索引树中的值之前,哈希版本将产生哈希值a的额外(可忽略)成本.另请参见 mongo/db/index/hash_access_method .h

If we're looking for the overal performance cost, the hashed version would incur an extra (negligible) cost of hashing the value of a before matching the value in the index tree. See also mongo/db/index/hash_access_method.h

此外,哈希索引将无法使用索引前缀压缩(WiredTiger).索引前缀压缩对于某些数据集尤其有效,例如基数较低的数据集(例如国家/地区)或具有重复值的数据集(例如电话号码,社会安全代码和地理坐标).对于复合索引尤其有效,在该索引中,第一个字段与所有字段重复第二个字段的唯一值.

Also, hashed index would not be able to utilise index prefix compression (WiredTiger). Index prefix compression is especially effective for some data sets, like those with low cardinality (eg, country), or those with repeating values, like phone numbers, social security codes, and geo-coordinates. It is especially effective for compound indexes, where the first field is repeated with all the unique values of second field.

是否有理由不在无序字段中使用哈希?

Any reason not to use hash in a non-ordered field?

通常没有理由散列非范围值.要选择分片键,请考虑基数频率值的变化率.

Generally there is no reason to hash a non-range value. To choose a shard key, consider the cardinality, frequency, and rate of change of the value.

哈希索引通常用于分片的特定情况.当分片键的值是哈希与远程分片.

Hashed index is commonly used for a specific case of sharding. When a shard key value is a monotonically increasing/decreasing value, the distribution of data would likely to go into one shard only. This is where a hashed shard key would be able to improve the distribution of writes. It's a minor trade-off to greatly improve your sharding cluster. See also Hashed vs Ranged Sharding.

是否值得在文档中插入随机散列或值,然后将其用于分片而不是在_id上生成的散列?

is it worth to insert a random hash or value with the document, and use that for sharding instead of a hash generated on the _id ?

是否值得,取决于用例.自定义哈希值意味着对哈希值的任何查询都必须通过自定义哈希代码即应用程序.

Whether it's worth it, depends on the use case. A custom hash value would mean that any query for the hash value would have to go through a custom hashing code i.e. application.

利用内置哈希函数的优势在于,当使用哈希索引解决查询时,MongoDB会自动计算哈希值.因此,应用程序不需要计算哈希.

The advantage for utilising the built-in hash function is that MongoDB automatically computes the hashes when resolving queries using hashed indexes. Therefore, applications do not need to compute hashes.

这篇关于哈希索引和升序索引之间的Mongodb性能差异(是否有理由不在无序字段中使用哈希?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆