CouchDB文档建模原则 [英] Principles for Modeling CouchDB Documents

查看:109
本文介绍了CouchDB文档建模原则的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个问题已经尝试了一段时间,但无法弄清:

I have a question that I've been trying to answer for some time now but can't figure out:

您如何设计或划分CouchDB文档?

How do you design, or divide up, CouchDB documents?

以博客帖子为例.

做到这一点的半关系"方式是创建一些对象:

The semi "relational" way to do it would be to create a few objects:

  • 发布
  • 用户
  • 评论
  • 标签
  • 摘要

这很有意义.但是我正在尝试使用沙发床(出于种种原因,这是很棒的)来建模同一件事,这非常困难.

This makes a great deal of sense. But I am trying to use couchdb (for all the reasons that it's great) to model the same thing and it's been extremely difficult.

那里的大多数博客文章都为您提供了一个简单的示例.他们基本上以相同的方式对其进行划分,但是说您可以在每个文档中添加任意"属性,这绝对是不错的选择.因此,您在CouchDB中会有类似的内容:

Most of the blog posts out there give you an easy example of how to do this. They basically divide it up the same way, but say you can add 'arbitrary' properties to each document, which is definitely nice. So you'd have something like this in CouchDB:

  • 发布(在文档中带有标签和代码段伪"模型)
  • 评论
  • 用户

有些人甚至会说您可以在其中添加评论"和用户",因此您需要:

Some people would even say you could throw the Comment and User in there, so you'd have this:


post {
    id: 123412804910820
    title: "My Post"
    body: "Lots of Content"
    html: "<p>Lots of Content</p>"
    author: {
        name: "Lance"
        age: "23"
    }
    tags: ["sample", "post"]
    comments {
        comment {
            id: 93930414809
            body: "Interesting Post"
        } 
        comment {
            id: 19018301989
            body: "I agree"
        }
    }
}

这看起来非常好,而且很容易理解.我也了解您如何编写视图,该视图仅从所有Post文档中提取注释,然后将它们放入Comment模型中,与Users和Tags一样.

That looks very nice and is easy to understand. I also understand how you could write views that extracted just the Comments from all your Post documents, to get them into Comment models, same with Users and Tags.

但是后来我想,为什么不将我的整个网站都放在一个文档中呢?":

But then I think, "why not just put my whole site into a single document?":


site {
    domain: "www.blog.com"
    owner: "me"
    pages {
        page {
            title: "Blog"
            posts {
                post {
                    id: 123412804910820
                    title: "My Post"
                    body: "Lots of Content"
                    html: "<p>Lots of Content</p>"
                    author: {
                        name: "Lance"
                        age: "23"
                    }
                    tags: ["sample", "post"]
                    comments {
                        comment {
                            id: 93930414809
                            body: "Interesting Post"
                        } 
                        comment {
                            id: 19018301989
                            body: "I agree"
                        }
                    }
                }
                post {
                    id: 18091890192984
                    title: "Second Post"
                    ...
                }
            }
        }
    }
}

您可以轻松地创建视图以查找所需内容.

You could easily make views to find what you wanted with that.

那么我的问题是,如何确定何时将文档分成较小的文档,或者何时在文档之间建立关系"?

Then the question I have is, how do you determine when to divide the document into smaller documents, or when to make "RELATIONS" between the documents?

我认为,如果像这样进行划分,它将更加面向对象",并且更容易映射到价值对象":

I think it would be much more "Object Oriented", and easier to map to Value Objects, if it were divided like so:


posts {
    post {
        id: 123412804910820
        title: "My Post"
        body: "Lots of Content"
        html: "<p>Lots of Content</p>"
        author_id: "Lance1231"
        tags: ["sample", "post"]
    }
}
authors {
    author {
        id: "Lance1231"
        name: "Lance"
        age: "23"
    }
}
comments {
    comment {
        id: "comment1"
        body: "Interesting Post"
        post_id: 123412804910820
    } 
    comment {
        id: "comment2"
        body: "I agree"
        post_id: 123412804910820
    }
}

...但是它开始看起来更像一个关系数据库.而且通常我会继承看起来像文档中的整个站点"的东西,因此使用关系进行建模更加困难.

... but then it starts looking more like a Relational Database. And often times I inherit something that looks like the "whole-site-in-a-document", so it's more difficult to model it with relations.

我已经阅读了很多有关如何/何时使用关系数据库与文档数据库的内容,所以这不是这里的主要问题.我只是想知道,在CouchDB中对数据建模时,有什么好的规则/原则适用.

I've read lots of things about how/when to use Relational Databases vs. Document Databases, so that's not the main issue here. I'm more just wondering, what's a good rule/principle to apply when modeling data in CouchDB.

另一个示例是XML文件/数据.一些XML数据具有10层以上的嵌套深度,我想使用与从ActiveRecord,CouchRest或任何其他Object Relational Mapper渲染JSON相同的客户端(例如,Ajax on Rails或Flex)来可视化它.有时我会得到巨大的XML文件,它们是整个网站结构的一部分,就像下面的文件一样,我需要将其映射到Value Objects以在Rails应用程序中使用,所以我不必编写另一种对数据进行序列化/反序列化的方法:

Another example is with XML files/data. Some XML data has nesting 10+ levels deep, and I would like to visualize that using the same client (Ajax on Rails for instance, or Flex) that I would to render JSON from ActiveRecord, CouchRest, or any other Object Relational Mapper. Sometimes I get huge XML files that are the entire site structure, like the one below, and I'd need to map it to Value Objects to use in my Rails app so I don't have to write another way of serializing/deserializing data:


<pages>
    <page>
        <subPages>
            <subPage>
                <images>
                    <image>
                        <url/>
                    </image>
                </images>
            </subPage>
        </subPages>
    </page>
</pages>

因此,一般的CouchDB问题是:

So the general CouchDB questions are:

  1. 您使用什么规则/原则来划分文档(关系等)?
  2. 可以将整个站点合并为一个文档吗?
  3. 如果是这样,您如何处理具有任意深度级别的序列化/反序列化文档(例如上面的大json示例或xml示例)?
  4. 还是不将它们转换为VO,您是否只是决定这些对象嵌套于对象关系图中,所以我将仅使用原始XML/JSON方法访问它们"?

非常感谢您的帮助,我很难说如何从现在开始使用CouchDB划分数据"这一问题.我希望能尽快到达那里.

Thanks a lot for your help, the issue of how to divide up your data with CouchDB has been difficult for me to say "this is how I should do it from now on". I hope to get there soon.

我研究了以下站点/项目.

I have studied the following sites/projects.

  1. CouchDB中的分层数据
  2. CouchDB Wiki
  3. 沙发-CouchDB应用
  4. CouchDB权威指南
  5. PeepCode CouchDB屏幕录像
  6. CouchRest
  7. CouchDB自述文件
  1. Hierarchical Data in CouchDB
  2. CouchDB Wiki
  3. Sofa - CouchDB App
  4. CouchDB The Definitive Guide
  5. PeepCode CouchDB Screencast
  6. CouchRest
  7. CouchDB README

...但是他们仍然没有回答这个问题.

...but they still haven't answered this question.

推荐答案

已经有一些很好的答案,但是我想在选项混合中添加一些最新的CouchDB功能,以处理由所述的原始情况. Viatropos.

There have been some great answers to this already, but I wanted to add some more recent CouchDB features to the mix of options for working with the original situation described by viatropos.

拆分文档的关键点是可能存在冲突的地方(如前所述).永远不要将大量纠结"的文档放在单个文档中,因为您将获得用于完全不相关的更新的单个修订路径(例如,添加注释以将修订添加到整个站点文档).起初,管理各种较小的文档之间的关系或连接可能会造成混乱,但是CouchDB提供了几种将不同的片段组合成单个响应的选项.

The key point at which to split up documents is where there might be conflicts (as mentioned earlier). You should never keep massively "tangled" documents together in a single document as you'll get a single revision path for completely unrelated updates (comment addition adding a revision to the entire site document for instance). Managing the relationships or connections between various, smaller documents can be confusing at first, but CouchDB provides several options for combining disparate pieces into single responses.

第一个大问题是视图整理.当您将键/值对发送到映射/归约查询的结果中时,键将基于UTF-8归类进行排序("a"位于"b"之前).您还可以将映射/归约中的复杂键作为JSON数组输出:["a", "b", "c"].这样做将允许您包括由数组键构建的各种树".使用上面的示例,我们可以输出post_id,然后输出我们所引用的事物的类型,然后输出其ID(如果需要).如果然后将返回文档的ID输出到返回值的对象中,则可以使用'include_docs'查询参数将这些文档包含在map/reduce输出中:

The first big one is view collation. When you emit key/value pairs into the results of a map/reduce query, the keys are sorted based on UTF-8 collation ("a" comes before "b"). You can also output complex keys from your map/reduce as JSON arrays: ["a", "b", "c"]. Doing that would allow you to include a "tree" of sorts built out of array keys. Using your example above, we can output the post_id, then the type of thing we're referencing, then its ID (if needed). If we then output the id of the referenced document into an object in the value that's returned we can use the 'include_docs' query param to include those documents in the map/reduce output:

{"rows":[
  {"key":["123412804910820", "post"], "value":null},
  {"key":["123412804910820", "author", "Lance1231"], "value":{"_id":"Lance1231"}},
  {"key":["123412804910820", "comment", "comment1"], "value":{"_id":"comment1"}},
  {"key":["123412804910820", "comment", "comment2"], "value":{"_id":"comment2"}}
]}

请求带有'?include_docs = true'的相同视图将添加一个'doc'键,该键将使用'value'对象中引用的'_id',或者如果该值不存在于'value'对象中,它将使用发出行的文档的"_id"(在本例中为"post"文档).请注意,这些结果将包括一个"id"字段,该字段引用发出该文件的源文档.我出于空间和可读性而忽略了它.

Requesting that same view with '?include_docs=true' will add a 'doc' key that will either use the '_id' referenced in the 'value' object or if that isn't present in the 'value' object, it will use the '_id' of the document from which the row was emitted (in this case the 'post' document). Please note, these results would include an 'id' field referencing the source document from which the emit was made. I left it out for space and readability.

然后我们可以使用'start_key'和'end_key'参数将结果过滤为单个帖子的数据:

We can then use the 'start_key' and 'end_key' parameters to filter the results down to a single post's data:

?start_key=["123412804910820"]&end_key=["123412804910820", {}, {}]

甚至专门提取特定类型的列表:

Or even specifically extract the list for a certain type:

?start_key=["123412804910820", "comment"]&end_key=["123412804910820", "comment", {}]

这些查询参数组合是可能的因为空对象("{}")始终位于排序规则的底部,而null或"始终位于排序规则的顶部.

These query param combinations are possible because an empty object ("{}") is always at the bottom of the collation and null or "" are always at the top.

在这种情况下,CouchDB的第二个有用的补充是_list函数.这样,您便可以通过某种模板系统(如果需要HTML,XML,CSV或其他格式)来运行上述结果,或者,如果您希望能够请求整个帖子的内容(包括以下内容),则可以输出统一的JSON结构:作者和评论数据),并以单个JSON文档形式返回,该文档与您的客户端/UI代码需求相匹配.这样做将允许您以这种方式请求帖子的统一输出文档:

The second helpful addition from CouchDB in these situations is the _list function. This would allow you to run the above results through a templating system of some kind (if you want HTML, XML, CSV or whatever back), or output a unified JSON structure if you want to be able to request an entire post's content (including author and comment data) with a single request and returned as a single JSON document that matches what your client-side/UI code needs. Doing that would allow you to request the post's unified output document this way:

/db/_design/app/_list/posts/unified??start_key=["123412804910820"]&end_key=["123412804910820", {}, {}]&include_docs=true

您的_list函数(在本示例中为"unified")将获取视图map/reduce的结果(在本示例中为"posts"),并且通过JavaScript函数运行它们,该函数将以您需要的内容类型(JSON,HTML等)发送回HTTP响应.

Your _list function (in this case named "unified") would take the results of the view map/reduce (in this case named "posts") and run them through a JavaScript function that would send back the HTTP response in the content type you need (JSON, HTML, etc).

结合这些内容,您可以将文档拆分为对更新,冲突和复制有用且安全"的任何级别,然后在需要时将它们放回原处.

Combining these things, you can split up your documents at whatever level you find useful and "safe" for updates, conflicts, and replication, and then put them back together as needed when they're requested.

希望有帮助.

这篇关于CouchDB文档建模原则的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆