documentdb 中的同构与异构 [英] homogeneous vs heterogeneous in documentdb

查看:16
本文介绍了documentdb 中的同构与异构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Azure DocumentDB,我在 NoSql 方面的所有经验都在 MongoDb 中.我查看了定价模型,成本是每个收藏品.在 MongoDb 中,我会为我正在使用的东西创建 3 个集合:用户、公司和电子邮件.我注意到这种方法每月每次收集的费用为 24 美元.

I am using Azure DocumentDB and all my experience in NoSql has been in MongoDb. I looked at the pricing model and the cost is per collection. In MongoDb I would have created 3 collections for what I was using: Users, Firms, and Emails. I noted that this approach would cost $24 per collection per month.

与我一起工作的人告诉我,我做错了.我应该将所有这三个东西都存储在一个集合中,并带有一个字段来描述数据类型是什么.每个集合都应该按日期或地理区域相关联,以便世界的一个部分有更小的部分可供搜索.并:

I was told by the people I work with that I'm doing it wrong. I should have all three of those things stored in a single collection with a field to describe what the data type is. That each collection should be related by date or geographic area so one part of the world has a smaller portion to search. and to:

"将不同类型的文档组合成一个集合并添加一个字段,用于在搜索中将它们分开,如类型字段或东西"

"Combine different types of documents into a single collection and add a field across all to separate them in searching like a type field or something"

我从来没有想过在 Mongo 中这样做,因为它会使索引、分片键和其他事情变得难以正确.

I would never have dreamed of doing that in Mongo, as it would make indexing, shard keys, and other things hard to get right.

对象之间可能没有重叠的字段(例如:电子邮件和公司对象)

There might not be may fields that overlap between the objects (example: Email and firm objects)

我可以这样做,但我似乎找不到任何其他人这样做的例子 - 这向我表明这可能是不正确的.现在,我不需要一个例子,但是有人可以指出我的某个位置,该位置描述了哪个是正确"的方法吗?或者,如果您确实为所有数据创建了一个集合 - 除了 Azure 的定价模型,这样做的优点/缺点是什么?

I can do it this way, but I can't seem to find a single example of anyone else doing it that way - which indicates to me that maybe it isn't right. Now, I don't need an example, but can someone point me to some location that describes which is the 'right' way to do it? Or, if you do create a single collection for all data - other than Azure's pricing model, what are the advantages / disadvantages in doing that?

有没有关于 DocumentDb 架构设计的好文章?

Any good articles on DocumentDb schema design?

推荐答案

是的.为了充分利用 CosmosDb 的潜力,需要将 Collection 视为一个完整的数据库系统,而不是设计为仅保存一种类型的对象的表".

Yes. In order to leverage CosmosDb to it's full potential need to think of a Collection is an entire Database system and not as a "table" designed to hold only one type of object.

Cosmos 中的分片非常简单.您只需指定一个所有文档都将填充的字段,然后选择它作为分区键.如果您只选择一个通用值,例如 keypartitionKey,您可以通过选择适当的值轻松地将入站电子邮件的存储与用户分开.p>

Sharding in Cosmos is exceedingly simply. You just specify a field that all of your documents will populate and select that as your partition key. If you just select a generic value such as key or partitionKey you can easily separate the storage of your inbound emails, from users, from anything else by picking appropriate values.

class InboundEmail
{
   public string Key {get; set;} = "EmailsPartition";
   // other properties
}

class User
{
   public string Key {get; set;} = "UsersPartition";
   // other properties
}

我所展示的是仍然只是一个例子.实际上,您的分区键值应该更加动态.重要的是要了解对已知分区的查询非常快.一旦您需要扫描多个分区,您就会看到速度更慢且成本更高的结果.

What I'm showing is still only an example though. In reality your partition key values should be even more dynamic. It's important to understand that queries against a known partition are extremely quick. As soon as you need to scan across multiple partitions you'll see much slower and more costly results.

因此,在一个摄取大量用户数据的应用中.将单个用户的活动一起保存在一个分区中对于该特定实体可能是有意义的.

So, in an app that ingests a lot of user data. Keeping a single user's activity together in one partition might make sense for that particular entity.

如果您想证明这是使用 CosmosDb 的适当方式,请考虑添加新的 Gremlin Graph API.图本质上是异构的,因为它们包含许多不同的实体和实体类型以及它们之间的关系.Cosmos 的查询边界位于集合级别,因此如果您尝试将所有实体放在不同的集合中,那么 Graph API 或查询都不会起作用.

If you want evidence that this is the appropriate way to use CosmosDb, consider the addition of the new Gremlin Graph APIs. Graphs are inherently heterogenous as they contain many different entities and entity types as well as the relationships between them. The query boundary of Cosmos is at the collection level so if you tried putting your entities all in different collections none of the Graph API or queries would work.

我在评论中注意到你做了这个声明你会在两个对象的每个字段上都有一个索引.CosmosDb 确实自动索引每个文档的每个字段.他们使用一种特殊的专有的基于路径的索引机制,确保 JSON 树的每条路径都有索引.您必须明确选择退出此自动索引功能.

I noticed in the comments you made this statement And you would have an index on every field in both objects. CosmosDb does automatically index every field of every document. They use a special proprietary path based indexing mechanism that ensures every path of your JSON tree has indices on it. You have to specifically opt out of this auto indexing feature.

这篇关于documentdb 中的同构与异构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆