documentdb中的同构vs异类 [英] homogeneous vs heterogeneous in documentdb

查看:95
本文介绍了documentdb中的同构vs异类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Azure DocumentDB,我在NoSql中的所有经验都在MongoDb中.我看了看定价模型,成本是每次收藏.在MongoDb中,我将为我使用的内容创建3个集合:用户,公司和电子邮件.我注意到,这种方法每月每个馆藏的费用为24美元.

I am using Azure DocumentDB and all my experience in NoSql has been in MongoDb. I looked at the pricing model and the cost is per collection. In MongoDb I would have created 3 collections for what I was using: Users, Firms, and Emails. I noted that this approach would cost $24 per collection per month.

与我一起工作的人告诉我,我做错了.我应该将所有这三样东西都存储在一个集合中,并用一个字段来描述数据类型是什么.每个馆藏都应按日期或地理区域进行关联,因此世界上只有一小部分可以进行搜索. 并要:

I was told by the people I work with that I'm doing it wrong. I should have all three of those things stored in a single collection with a field to describe what the data type is. That each collection should be related by date or geographic area so one part of the world has a smaller portion to search. and to:

将不同类型的文档合并到一个集合中并添加 跨所有字段以将它们分开进行搜索,例如类型字段或 东西"

"Combine different types of documents into a single collection and add a field across all to separate them in searching like a type field or something"

我从来没有梦想过在Mongo中这样做,因为它会使索引,分片键和其他事情变得难以解决.

I would never have dreamed of doing that in Mongo, as it would make indexing, shard keys, and other things hard to get right.

对象之间可能没有字段重叠(例如:电子邮件和公司对象)

There might not be may fields that overlap between the objects (example: Email and firm objects)

我可以这样做,但是我似乎找不到任何其他人这样做的例子-这向我表明,也许这是不对的.现在,我不需要示例,但是有人可以将我指向某个位置,该位置描述哪种是正确的"方法吗?或者,如果确实为所有数据创建一个集合(除了Azure的定价模型之外),这样做的优点/缺点是什么?

I can do it this way, but I can't seem to find a single example of anyone else doing it that way - which indicates to me that maybe it isn't right. Now, I don't need an example, but can someone point me to some location that describes which is the 'right' way to do it? Or, if you do create a single collection for all data - other than Azure's pricing model, what are the advantages / disadvantages in doing that?

关于DocumentDb模式设计的好文章吗?

Any good articles on DocumentDb schema design?

推荐答案

是.为了充分利用CosmosDb的全部潜力,需要考虑一个Collection是一个完整的数据库系统,而不是一个仅用于容纳一种对象的表".

Yes. In order to leverage CosmosDb to it's full potential need to think of a Collection is an entire Database system and not as a "table" designed to hold only one type of object.

在宇宙中共享非常简单.您只需指定一个将填充所有文档的字段,然后选择该字段作为分区键.如果只选择一个通用值,例如keypartitionKey,则可以通过选择适当的值来轻松地将入站电子邮件的存储与用户和其他任何对象分开.

Sharding in Cosmos is exceedingly simply. You just specify a field that all of your documents will populate and select that as your partition key. If you just select a generic value such as key or partitionKey you can easily separate the storage of your inbound emails, from users, from anything else by picking appropriate values.

class InboundEmail
{
   public string Key {get; set;} = "EmailsPartition";
   // other properties
}

class User
{
   public string Key {get; set;} = "UsersPartition";
   // other properties
}

我所显示的仍然是仍然一个例子.实际上,您的分区键值应该更加动态.重要的是要了解,对已知分区的查询非常快.一旦需要在多个分区上进行扫描,您会发现结果变得更慢,成本更高.

What I'm showing is still only an example though. In reality your partition key values should be even more dynamic. It's important to understand that queries against a known partition are extremely quick. As soon as you need to scan across multiple partitions you'll see much slower and more costly results.

因此,在提取大量用户数据的应用中.对于一个特定的实体而言,将一个用户的活动保持在一个分区中可能是有意义的.

So, in an app that ingests a lot of user data. Keeping a single user's activity together in one partition might make sense for that particular entity.

如果您想证明这是使用CosmosDb的适当方法,请考虑添加新的Gremlin Graph API.图在本质上是异构的,因为它们包含许多不同的实体和实体类型以及它们之间的关系. Cosmos的查询边界位于集合级别,因此,如果您尝试将实体全部置于不同的集合中,则Graph API或查询都将无法正常工作.

If you want evidence that this is the appropriate way to use CosmosDb, consider the addition of the new Gremlin Graph APIs. Graphs are inherently heterogenous as they contain many different entities and entity types as well as the relationships between them. The query boundary of Cosmos is at the collection level so if you tried putting your entities all in different collections none of the Graph API or queries would work.

我在您发表此声明And you would have an index on every field in both objects的评论中注意到. CosmosDb 执行会自动为每个文档的每个字段建立索引.他们使用一种特殊的基于路径的专有专有索引机制,以确保JSON树的每个路径都具有索引.您必须专门选择退出此自动索引功能.

I noticed in the comments you made this statement And you would have an index on every field in both objects. CosmosDb does automatically index every field of every document. They use a special proprietary path based indexing mechanism that ensures every path of your JSON tree has indices on it. You have to specifically opt out of this auto indexing feature.

这篇关于documentdb中的同构vs异类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆