nosql数据库中的树结构 [英] Tree structures in a nosql database

查看:227
本文介绍了nosql数据库中的树结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为Google App Engine开发一款应用程序,该应用程序使用BigTable作为其数据存储区。



这是一个关于共同编写故事的应用程序。这是一个非常简单的爱好项目,我只是为了娱乐而工作。它是开源的,你可以在这里看到:




$ b


$ b

如何设置结构既可以提高写作效率,也可以提高阅读效果? =h2_lin>解决方案

有许多众所周知的方式来表示数据库中的树;他们每个人都有自己的优点和缺点。以下是最常见的:


  • 邻接列表,其中每个节点存储其父级的标识。 物化路径,这是Keyur描述的策略。这也是App Engine中的实体组(例如父实体)使用的方法。这或多或少是您在更新中描述的内容。

  • 嵌套集合,其中每个节点都有左和右ID,以便所有子节点都包含在该范围内。
  • 邻接列表包含根ID。



这些都有各自的优点和缺点。邻接列表很简单,且更新便宜,但需要多个查询来检索子树(每个父节点一个)。增加的邻接列表可以通过在每条记录中存储根节点的ID来检索整个树。



物化路径易于实现且价格便宜,并且允许查询任意的子树,但为深层树增加开销。



嵌套集的实现更加困难,并且每次创建时平均需要更新一半节点插入。它们允许你查询任意的子树,而不会增加物化路径的密钥长度问题。



然而,在你的具体情况下,你似乎并不需要树状结构:每一个故事,尽管它可能是原创的,但是独立。我会建议有一个'Story'模型,其中包含其段落的键列表(例如,在Python中是一个db.ListProperty(db.Key))。要呈现故事,您可以获取故事,然后对所有段落进行批量提取。要分支故事,只需复制故事条目 - 保留对段落的引用不变。


I'm developing an application for Google App Engine which uses BigTable for its datastore.

It's an application about writing a story collaboratively. It's a very simple hobby project that I'm working on just for fun. It's open source and you can see it here: http://story.multifarce.com/

The idea is that anyone can write a paragraph, which then needs to be validated by two other people. A story can also be branched at any paragraph, so that another version of the story can continue in another direction.

Imagine the following tree structure:

Every number would be a paragraph. I want to be able to select all the paragraphs in every unique story line. Basically, those unique story lines are (2, 7, 2); (2, 7, 6, 5); (2, 7, 6, 11) and (2, 5, 9, 4). Ignore that the node "2" appears twice, I just took a tree structure diagram from Wikipedia.

I also made a diagram of a proposed solution: https://docs.google.com/drawings/edit?id=1fdUISIjGVBvIKMSCjtE4xFNZxiE08AoqvJSLQbxN6pc&hl=en

How can I set up a structure is performance efficient both for writing, but most importantly for reading?

解决方案

There are a number of well known ways to represent trees in databases; each of them have their pros and cons. Here are the most common:

  • Adjacency list, where each node stores the ID of its parent.
  • Materialized path, which is the strategy Keyur describes. This is also the approach used by entity groups (eg, parent entities) in App Engine. It's also more or less what you're describing in your update.
  • Nested sets, where each node has 'left' and 'right' IDs, such that all child nodes are contained in that range.
  • Adjacency lists agumented with a root ID.

Each of these has its own advantages and disadvantages. Adjacency lists are simple, and cheap to update, but require multiple queries to retrieve a subtree (one for each parent node). Augmented adjacency lists make it possible to retrieve an entire tree by storing the ID of the root node in every record.

Materialized paths are easy to implement and cheap to update, and permit querying arbitrary subtrees, but impose increasing overhead for deep trees.

Nested sets are tougher to implement, and require updating, on average, half the nodes each time you make an insertion. They allow you to query arbitrary subtrees, without the increasing key length issue materialized path has.

In your specific case, though, it seems like you don't actually need a tree structure at all: each story, branched off an original though it may be, stands alone. What I would suggest is having a 'Story' model, which contains a list of keys of its paragraphs (Eg, in Python a db.ListProperty(db.Key)). To render a story, you fetch the Story, then do a batch fetch for all the Paragraphs. To branch a story, simply duplicate the story entry - leaving the references to Paragraphs unchanged.

这篇关于nosql数据库中的树结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆