文档数据库:冗余数据,引用等(专门用于MongoDB) [英] Document Databases: Redundant data, references, etc. (MongoDB specifically)

查看:149
本文介绍了文档数据库:冗余数据,引用等(专门用于MongoDB)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

似乎在很多情况下,构建数据的适当方法是将其拆分为两个文档.假设是连锁商店,您正在保存每个客户访问过的商店.商店和客户需要是独立的数据,因为它们与许多其他事物交互,但是我们确实需要将它们关联起来.

It seems like I run into lots of situations where the appropriate way to build my data is to split it into two documents. Let's say it was for a chain of stores and you were saving which stores each customer had visited. Stores and Customers need to be independent pieces of data because they interact with plenty of other things, but we do need to relate them.

因此,简单的答案是将用户的ID存储在商店文档中,或将商店的ID存储在用户文档中.不过,由于ID没用,通常您还是希望访问1-2个其他数据块以用于显示.例如客户名称或商店名称.

So the easy answer is to store the user's Id in the store document, or the store's Id in the user's document. Often times though, you want to access 1-2 other pieces of data for display purposes because Id's aren't useful. Like maybe the customer name, or the store name.

  1. 您通常会存储整个文档的副本吗?还是只存储您需要的数据?也许取决于文档的大小以及您需要多少文档.
  2. 您如何处理重复数据这一事实?当数据改变时,您会去寻找数据吗?加载数据时要间隔一定时间更新数据?只有在您负担得起过时的数据时才重复吗?

非常感谢您提供的投入和/或与任何最佳做法"的链接,或者至少是对这些主题的合理讨论.

Would appreciate your input and/or links to any kind of 'best practices' or at least well-reasoned discussion of these topics.

推荐答案

基本上有两种情况:新鲜陈旧.

There are basically two scenario's: fresh and stale.

存储重复数据很容易.维护重复数据是困难的部分.因此,最简单的方法是避免维护,只需不存储任何重复的数据即可.如果您需要新数据,这主要有用.仅存储引用,并在需要检索信息时查询集合.

Storing duplicate data is easy. Maintaining the duplicate data is the hard part. So the easiest thing to do is to avoid maintenance, by simply not storing any duplicate data to begin with. This is mainly useful if you need fresh data. Only store the references, and query the collections when you need to retrieve information.

在这种情况下,由于额外的查询,您会有一些开销.替代方法是跟踪重复数据的所有位置,并在每次更新时更新所有实例.这也涉及开销,特别是在您提到的N对M关系中.因此,无论哪种方式,如果您需要新数据,您都会 有一些开销.您不能两全其美.

In this scenario, you'll have some overhead due to the extra queries. The alternative is to track all locations of duplicate data, and update all instances on each update. This also involves overhead, especially in N-to-M relations like the one you mentioned. So either way, you will have some overhead, if you require fresh data. You can't have the best of both worlds.

如果您有能力拥有陈旧的数据,则事情会变得容易得多.为避免查询开销,您可以存储重复数据.为避免必须维护重复数据,您将不存储重复数据.至少不是主动.

If you can afford to have stale data, things get a lot easier. To avoid query overhead, you can store duplicate data. To avoid having to maintain duplicate data, you're not going to store duplicate data. At least not actively.

在这种情况下,您还将只想在文档之间存储引用.然后使用定期的map-reduce作业生成重复数据.然后,您可以查询单个map-reduce结果,而不是单独的集合.这样可以避免查询开销,但也不必寻找数据更改.

In this scenario you'll also want to store only the references between documents. Then use a periodic map-reduce job to generate the duplicate data. You can then query the single map-reduce result, rather than separate collections. This way you avoid the query overhead, but you also don't have to hunt down data changes.

仅存储对其他文档的引用.如果您负担得起过时的数据,请使用定期的map-reduce作业生成重复的数据.避免维护重复数据;这很复杂且容易出错.

Only store references to other documents. If you can afford stale data, use periodic map-reduce jobs to generate duplicate data. Avoid maintaining duplicate data; it's complex and error-prone.

这篇关于文档数据库:冗余数据,引用等(专门用于MongoDB)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆