DocumentDb GUID索引精度 [英] DocumentDb GUID Index Precision

查看:63
本文介绍了DocumentDb GUID索引精度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们的文档中有一个非唯一的GUID/UUID值:

Let's say we have a non-unique GUID/UUID value in our documents:

[
  {
    "id": "123456",
    "Key": "117dfd49-a71d-413b-a9b1-841e88db06e8"
    "Name": "Kaapstad",
  },
  ...
]

我们只想通过相等性对此进行查询.无需查询范围或订单.例如:

We want to query upon this through equality only. No range or orderby querying required. E.g:

SELECT * FROM c where c.Key = "117dfd49-a71d-413b-a9b1-841e88db06e8"

下面是索引定义.它是使用String数据类型的哈希索引(因为将不执行范围查询)(因为Javascript本身不支持Guid)

Below is the index definition. It's a hash index (since no range querying will be performed) using a String data type (since Javascript doesn't support Guid natively)

collection.IndexingPolicy.IncludedPaths.Add(
    new IncludedPath { 
        Path = "/Key/?", 
        Indexes = new Collection<Index> { 
            new HashIndex(DataType.String) { Precision = -1 }
        }
    });

但是最佳的索引编制精度是什么?

此MSDN页面不能使我很清楚哪种精度值最适合这样的值:

This MSDN page doesn't make it clear to me as to what precision value would be most suited to such a value:

索引精度配置对于字符串范围更有用.自从 字符串可以是任意长度,索引精度的选择 可能会影响字符串范围查询的性能,并影响 所需的索引存储空间量.字符串范围索引可以是 配置为1-100或-1(最大值").如果你想表演 针对字符串属性的Order By查询,则必须指定一个 相应路径的精度为-1.

Index precision configuration is more useful with string ranges. Since strings can be any arbitrary length, the choice of the index precision can impact the performance of string range queries, and impact the amount of index storage space required. String range indexes can be configured with 1-100 or -1 ("maximum"). If you would like to perform Order By queries against string properties, then you must specify a precision of -1 for the corresponding paths.

推荐答案

您可以根据希望包含属性键路径的文档数(恰好是在您的示例中为媒体资源).

You can fine-tune the indexing precision value depending on the number of documents you expect to contain the path for your property key (which happens to be the Key property in your example).

哈希索引的索引精度指示将属性值哈希到的字节数.因此,降低精度值有助于优化存储索引所需的存储量.提高精度值(在散列索引的上下文中)有助于防止索引上的散列冲突.

The indexing precision for a hash index indicates the number of bytes to hash the property value to. Thus, lowering the precision value helps optimize the amount of storage required to store the index. Raising the precision value (in the context of a hash index) helps guard against hash collisions on the index.

例如,假设路径foo上的哈希索引精度值为3.

For example, let's assume a hash index precision value of 3 on the path foo.

3个字节= 3 * 8 = 24位.

3 bytes = 3 * 8 = 24 bits.

24位可以支持:2 ^ 24 = 16,777,216个值

24 bits can support: 2^24 = 16,777,216 values

根据信纸孔原理,当存储具有foo属性的> 16,777,216个文档时,可以确保发生哈希冲突.发生哈希冲突后,DocumentDB将需要对找到的文档子集执行扫描.例如,如果您有30,000,000个具有foo属性的文档-您可以期望平均扫描2个文档.

By pigeonhole principle, you are guaranteed to have a hash collision when storing >16,777,216 documents with a foo property. Upon a hash collision, DocumentDB will then need to perform a scan on the subset of documents found. For example, if you had 30,000,000 documents with a foo property - you can expect to scan across 2 documents on average.

这篇关于DocumentDb GUID索引精度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆