在Cloud Firestore中管理非规范化/重复数据 [英] Managing Denormalized/Duplicated Data in Cloud Firestore

查看:65
本文介绍了在Cloud Firestore中管理非规范化/重复数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果您决定在Firestore中对数据进行非规范化/复制以优化读取,那么通常使用什么模式(如果有)来跟踪重复的数据,以便可以正确地对其进行更新以避免数据不一致? >

例如,如果我具有Pinterest板之类的功能,平台上的任何user都可以将我的post固定到自己的board,那么您将如何跟踪许多中的重复数据地点?

关于为数据可以存在的每个唯一位置创建一个类似关系的表,该表用于重建需要更新的路径.

例如,创建一个users_posts_boards集合,该集合首先是具有postIDs子集合的userIDs集合,最后具有具有boardOwnerIDboardIDs另一个子集合.然后,您可以使用它们来重构post(例如/users/[boardOwnerID]/boards/[boardID]/posts/[postID])的重复数据的路径吗?

如果posts另外可以与groupslists共享,您是否会继续创建users_posts_groupsusers_posts_lists集合和子集合,以相同的方式跟踪重复的数据?

或者,您是否会拥有一个posts_denormalization_tracker只是一个唯一postIDs的集合,其中包括post被复制到的位置的子集合?

{
  postID: 'someID',
  locations: ( <---- collection
    "path/to/post/location1",
    "path/to/post/location2",
    ...
  )
}

这意味着您基本上需要通过Cloud Functions完成对Firestore的所有写操作,出于安全原因,这些功能可以跟踪此数据....除非Firestore安全规则足够强大,以允许对/posts_denormalization_tracker/[postID]/locations子集合,不允许读取或更新子集合或父postIDs集合.

我基本上是在寻找一种跟踪严重的非规范化数据的理智方法.

是的,另一个很好的例子是post作者的profile信息嵌入在每个post中.想象一下,在整个平台上共享时,地狱试图保持所有最新状态,然后user更新其profile.

解决方案

由于您从如何在Firestore中进行批量更新

关于为数据可以存在的每个唯一位置创建一个类似关系的表,该表用于重建需要更新的路径.

我认为不需要添加额外的类似关系的表",但是如果您对此感到不舒服,请继续使用它.

然后,您可以使用它们来重构帖子的重复数据的路径(例如/users/[boardOwnerID]/boards/[boardID]/posts/[postID])吗?

是的,您需要将每个文档的ID传递给每个document()方法,以使更新操作起作用.不幸的是,Cloud Firestore的文档路径中没有通配符.您必须通过文档ID来识别它们.

或者,您是否会拥有一个posts_denormalization_tracker,它只是唯一的postID的集合,其中包括该帖子被复制到的位置的子集合?

我认为这也不是必需的,因为它需要额外的读取操作.由于Firestore中的所有内容都是关于读写次数的,所以我认为您应该再次考虑这种方法.请参阅 Firestore的使用和限制.

除非Firestore安全规则足够强大,以允许对/posts_denormalization_tracker/[postID]/locations子集合进行添加操作,而又不允许对该子集合或父postIDs集合进行读取或更新.

Firestore安全规则如此强大.您还可以允许阅读或编写甚至应用有关所需的每个CRUD操作的安全规则.

我基本上是在寻找一种跟踪严重的非规范化数据的理智方法.

我能想到的最简单的方法是将操作添加到键和值类型的数据结构中.假设我们有一张看起来像这样的地图:

Map<Object, DocumentRefence> map = new HashMap<>();
map.put(customObject1, reference1);
map.put(customObject2, reference2);
map.put(customObject3, reference3);
//And so on

遍历整个地图,并将所有这些键和值添加到批处理中,然后提交批处理.

If you have decided to denormalize/duplicate your data in Firestore to optimize for reads, what patterns (if any) are generally used to keep track of the duplicated data so that they can be updated correctly to avoid inconsistent data?

As an example, if I have a feature like a Pinterest Board where any user on the platform can pin my post to their own board, how would you go about keeping track of the duplicated data in many locations?

What about creating a relational-like table for each unique location that the data can exist that is used to reconstruct the paths that require updating.

For example, creating a users_posts_boards collection that is firstly a collection of userIDs with a sub-collection of postIDs that finally has another sub-collection of boardIDs with a boardOwnerID. Then you use those to reconstruct the paths of the duplicated data for a post (eg. /users/[boardOwnerID]/boards/[boardID]/posts/[postID])?

Also if posts can additionally be shared to groups and lists would you continue to make users_posts_groups and users_posts_lists collections and sub-collections to track duplicated data in the same way?

Alternatively, would you instead have a posts_denormalization_tracker that is just a collection of unique postIDs that includes a sub-collection of locations that the post has been duplicated to?

{
  postID: 'someID',
  locations: ( <---- collection
    "path/to/post/location1",
    "path/to/post/location2",
    ...
  )
}

This would mean that you would basically need to have all writes to Firestore done through Cloud Functions that can keep a track of this data for security reasons....unless Firestore security rules are sufficiently powerful to allow add operations to the /posts_denormalization_tracker/[postID]/locations sub-collection without allowing reads or updates to the sub-collection or the parent postIDs collection.

I'm basically looking for a sane way to track heavily denormalized data.

Edit: oh yeah, another great example would be the post author's profile information being embedded in every post. Imagine the hellscape trying to keep all that up-to-date as it is shared across a platform and then a user updates their profile.

解决方案

I'm aswering this question because of your request from here.

When you are duplicating data, there is one thing that need to keep in mind. In the same way you are adding data, you need to maintain it. With other words, if you want to update/detele an object, you need to do it in every place that it exists.

What patterns (if any) are generally used to keep track of the duplicated data so that they can be updated correctly to avoid inconsistent data?

To keep track of all operations that we need to do in order to have consistent data, we add all operations to a batch. You can add one or more update operations on different references, as well as delete or add operations. For that please see:

What about creating a relational-like table for each unique location that the data can exist that is used to reconstruct the paths that require updating.

In my opinion there is no need to add an extra "relational-like table" but if you feel confortable with it, go ahead and use it.

Then you use those to reconstruct the paths of the duplicated data for a post (eg. /users/[boardOwnerID]/boards/[boardID]/posts/[postID])?

Yes, you need to pass to each document() method, the corresponding document id in order to make the update operation work. Unfortunately, there are no wildcards in Cloud Firestore paths to documents. You have to identify the documents by their ids.

Alternatively, would you instead have a posts_denormalization_tracker that is just a collection of unique postIDs that includes a sub-collection of locations that the post has been duplicated to?

I consider that isn't also necessary since it require extra read operations. Since everything in Firestore is about the number of read and writes, I think you should think again about this approach. Please see Firestore usage and limits.

unless Firestore security rules are sufficiently powerful to allow add operations to the /posts_denormalization_tracker/[postID]/locations sub-collection without allowing reads or updates to the sub-collection or the parent postIDs collection.

Firestore security rules are so powerful to do that. You can also allow to read or write or even apply security rules regarding each CRUD operation you need.

I'm basically looking for a sane way to track heavily denormalized data.

The simplest way I can think of, is to add the operation in a datastructure of type key and value. Let's assume we have a map that looks like this:

Map<Object, DocumentRefence> map = new HashMap<>();
map.put(customObject1, reference1);
map.put(customObject2, reference2);
map.put(customObject3, reference3);
//And so on

Iterate throught the map, and add all those keys and values to batch, commit the batch and that's it.

这篇关于在Cloud Firestore中管理非规范化/重复数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆