MongoDB schema 性能优化 [英] MongoDB schema performance optimization

查看:30
本文介绍了MongoDB schema 性能优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,我想构建具有最高性能的 mongoDB 模式.

Hello I want to build mongoDB schema with the highest performance.

通常我的问题是:

什么更好:包含巨大子文档数组的集合(大约 10000 个)或 2 个带有引用的分隔集合(其中一个可能包含 50000000 条记录)?

详细信息

我有一个包含复杂子文档的 mongoDB 模型.

I have a mongoDB Model with the complex sub documents.

var usersSchema = new Schema({
email:{
    type: String,
    unique: true,
    required: true
},
packages : [{
    package : {type: Schema.Types.ObjectId, ref: 'Packages'},
    from : {type : Schema.Types.ObjectId, ref :'Languages'},
    to : {type : Schema.Types.ObjectId, ref :'Languages'},
    words : [{
        word: {type: String},
        progress: {type: Number,default : 0}
    }]
}]
});

每个用户可能会有 3-10 个包含 1000 个单词的包.应用程序可能有 >10000 个用户.所以可能我会存储大约 50 000 000 个单词.但我希望有分页、普通搜索和另一个用于收集的多汁 mongoDB 功能.但据我所知,在子文档中使用此功能非常困难.

Every user will probably have 3-10 packages with 1000 words. Application will probably have >10000 users. So probably I'll store about 50 000 000 words. But I'd love to have Pagination, normal Search and another juicy mongoDB features for collection words. But as I know it's pretty hard to use this functions with the sub documents.

我的问题是:对于具有无效分页、搜索和更新,但被用户划分的另外一个具有 50 000 000 条记录的独立模型,什么对系统性能更好?像这样的东西

My question is: What would be better for the system performance SubDocuments with the invalid pagination, search and update, but divided by users or one more independent model with 50 000 000 records ? something like this

var wordsSchema = new Schema({
      word: {type: String},
      progress: {type: Number,default : 0},
      user : {type : Schema.Types.ObjectId, ref :'Users'}
  }]
});

推荐答案

哪个更好:包含巨大子文档数组的集合(大约 10000 个)或 2 个带有引用的分隔集合(其中一个可能包含 50000000 条记录)?

What is better: Collection with the huge sub documents array inside(about 10000) or 2 separated collections with the references(one of them may contain 50000000 records)?

这里首先想到的是:为什么存储引用的成本是存储在子文档中的成本的 5000 倍?

The first thing that comes to mind here is: why is storing a reference costing you 5000 times what it costs to store in a subdocument?

好的,看看你的模式,我相信最好的方法是单独收集单词,而不是包.

Okay, looking at your schema I believe the best method is separate collection for words, not packages.

我看到的第一个危险信号是你在这里的双重嵌套:

The first red flag I saw is your double nesting here:

packages : [{
    package : {type: Schema.Types.ObjectId, ref: 'Packages'},
    from : {type : Schema.Types.ObjectId, ref :'Languages'},
    to : {type : Schema.Types.ObjectId, ref :'Languages'},
    words : [{
        word: {type: String},
        progress: {type: Number,default : 0}
    }]
}]

words 子文档在当前版本的 MongoDB 中将很难使用,通常 2-3 级深度开始有问题,尤其是位置运算符.

The words subdocument will be very hard to work with in the current version of MongoDB, normally 2-3 levels deep starts to have problems, especially with positional operators.

现在考虑到您应该始终从您可以获得的最高价值开始工作:

Now considering that you should always work from the highest possible value you can get here:

每个用户可能会有 3-10 个包含 1000 个字的包.

Every user will probably have 3-10 packages with 1000 words.

你也去考虑这个文件的住房成本.您需要的运算符将是内存中的运算符,例如 $pull$push$addToSet 等,这意味着您的整个文档都需要被序列化并加载到 MongoDB 的本机 C++ 结构中.这将是一项极其耗时的任务,具体取决于这些文档的流量.

You have also go to consider the cost of housing this document. The operators you need will be in-memory ones such as $pull, $push, $addToSet etc which means your entire document will need to be serialised and loaded into MongoDB's native C++ structs. This will be an extremely consuming task depending on the traffic to those documents.

考虑您的意见:

我想对单词集合进行大量的读写操作,而对用户集合的操作要少得多.

I want to do a lot of read and write operations with the word collection, much less operations with the user collection.

它只是在将单词嵌入主用户文档的棺材中再钉上一颗钉子.考虑到我在上一段中所说的,这对于在 words 数组上使用内存运算符的成本来说不会很好.

it merely puts another nail in the coffin of embedding the words within the main user document. Considering what I said in the previous paragraph this will not work well with the cost of using in-memory operators on the words array.

但我希望有分页、普通搜索和另一个用于收集词的多汁 mongoDB 功能.

But I'd love to have Pagination, normal Search and another juicy mongoDB features for collection words.

如果将单词拆分出来会更好,$slice 也是一个内存操作符,在这里可能会降低性能.

This will work much better if the words are split out, $slice is also an in-memory operator and probably would suffer diminished performance here.

这是一个快速合理的回应.我相信我可以解释更多关于我的原因,但这应该足够了.

And that's a quick reasoned response. I am sure there is more I could explain about my reason but that should be enough.

这篇关于MongoDB schema 性能优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆