MongoDB 查询优化 [英] MongoDB queries optimisation

查看:38
本文介绍了MongoDB 查询优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望从我的用户模型中检索一些信息,如下所示:

I wish to retrieve several information from my User model that looks like this:

var userSchema = new mongoose.Schema({
  email: { type: String, unique: true, lowercase: true },
  password: String,

  created_at: Date,
  updated_at: Date,

  genre : { type: String, enum: ['Teacher', 'Student', 'Guest'] },
  role : { type: String, enum: ['user', 'admin'], default: 'user' },
  active : { type: Boolean, default: false },

  profile: {
    name : { type: String, default: '' },
    headline : { type: String, default: '' },
    description : { type: String, default: '' },
    gender : { type: String, default: '' },
    ethnicity : { type: String, default: '' },
    age : { type: String, default: '' }
  },

  contacts : {
    email : { type: String, default: '' },
    phone : { type: String, default: '' },
    website : { type: String, default: '' }
  },

  location : {
    formattedAddress : { type: String, default: '' },
    country : { type: String, default: '' },
    countryCode : { type: String, default: '' },
    state : { type: String, default: '' },
    city : { type: String, default: '' },
    postcode : { type: String, default: '' },
    lat : { type: String, default: '' },
    lng : { type: String, default: '' }
  }
});

在主页中,我有一个位置过滤器,您可以在其中浏览来自国家或城市的用户.

In Homepage I have a filter for location where you can browse Users from Country or City.

所有字段还包含其中的用户数量:

All the fields contains also the number of users in there:

United Kingdom
  All Cities (300)
  London (150)
  Liverpool (80)
  Manchester (70)
France
  All Cities (50)
  Paris (30)
  Lille (20)
Nederland
  All Cities (10)
  Amsterdam (10)
Etc...

这在主页中,然后我还有学生和教师页面,我希望在这些页面中只提供有关这些国家和城市有多少教师的信息......

This in the Homepage, then I have also the Students and Teachers pages where I wish to have information only about how many teachers there are in those Countries and Cities...

我想要做的是创建对 MongoDB 的查询,以便通过单个查询检索所有这些信息.

What I'm trying to do is to create a query to MongoDB to retrieve all these information with a single query.

目前查询如下所示:

User.aggregate([
    { 
      $group: { 
        _id: { city: '$location.city', country: '$location.country', genre: '$genre' },
        count: { $sum: 1 }
      }
    },
    {
      $group: { 
        _id: '$_id.country',
        count: { $sum: '$count' },
        cities: { 
          $push: { 
            city: '$_id.city', 
            count: '$count'
          }
        },
        genres: {
          $push: {
            genre: '$_id.genre',
            count: '$count'
          }
        }
      }
    }
  ], function(err, results) {
    if (err) return next();
    res.json({ 
        res: results
    });
  });

问题是我不知道如何获得我需要的所有信息.

The problem is that I don't know how to get all the information I need.

  • 我不知道如何获得每个国家/地区的总用户时长.
  • 我有每个国家/地区的用户长度.
  • 我有每个城市的用户长度.
  • 我不知道如何获得相同但针对特定类型的内容.

是否可以在 Mongo 中通过单个查询获得所有这些信息?

否则:

像这样向 Mongo 创建 2、3 个不同请求的承诺:

Creating few promises with 2, 3 different requests to Mongo like this:

getSomething
.then(getSomethingElse)
.then(getSomethingElseAgain)
.done

我确信每次指定数据都会更容易存储,但是:当数据库中的用户超过 5000/10000 时,性能好吗?

I'm sure it would be easier storing every time specified data but: is it good for performance when there are more than 5000 / 10000 users in the DB?

抱歉,我仍在学习中,我认为这些对于了解 MongoDB 性能/优化至关重要.

Sorry but I'm still in the process of learning and I think these things are crucial to understand MongoDB performance / optimisation.

谢谢

推荐答案

您想要的是分面搜索"result 用于保存当前结果集中匹配项的统计信息.随后,虽然有些产品似乎"在一个响应中完成所有工作,但您必须考虑到大多数通用存储引擎将需要多个操作.

What you want is a "faceted search" result where you hold the statistics about the matched terms in the current result set. Subsequently, while there are products that "appear" to do all the work in a single response, you have to consider that most generic storage engines are going to need multiple operations.

使用 MongoDB,您可以使用两个查询本身获取结果,另一个查询获取方面信息.这将提供与诸如 SolrElasticSearch.

With MongoDB you can use two queries to get the results themselves and another to get the facet information. This would give similar results to the faceted results available from dedicated search engine products like Solr or ElasticSearch.

但是为了有效地做到这一点,您希望以一种可以有效使用的方式将其包含在您的文档中.您想要的一种非常有效的形式是使用标记化数据数组:

But in order to do this effectively, you want to include this in your document in a way it can be used effectively. A very effective form for what you want is using an array of tokenized data:

 {
     "otherData": "something",
     "facets": [
         "country:UK",
         "city:London-UK",
         "genre:Student"
     ]
 }

因此,factets"是文档中的单个字段,而不是多个位置.这使得索引和查询变得非常容易.然后,您可以有效地汇总结果并获得每个方面的总数:

So "factets" is a single field in your document and not in multiple locations. This makes it very easy to index and query. Then you can effectively aggregate across your results and get the totals for each facet:

User.aggregate(
    [
        { "$unwind": "$facets" },
        { "$group": {
            "_id": "$facets",
            "count": { "$sum": 1 }
        }}
    ],
    function(err,results) {

    }
);

或者更理想的是在 $match:

User.aggregate(
    [
        { "$match": { "facets": { "$in": ["genre:student"] } } },
        { "$unwind": "$facets" },
        { "$group": {
            "_id": "$facets",
            "count": { "$sum": 1 }
        }}
    ],
    function(err,results) {

    }
);

最终给出如下回复:

{ "_id": "country:FR", "count": 50 },
{ "_id": "country:UK", "count": 300 },
{ "_id": "city:London-UK", "count": 150 },
{ "_id": "genre:Student": "count": 500 }

这样的结构很容易遍历和检查诸如离散的国家"和属于国家"的城市"之类的东西,因为这些数据只是由连字符-"始终如一地分隔.

Such a structure is easy to traverse and inspect for things like the discrete "country" and the "city" that belongs to a "country" as that data is just separated consistently by a hyphen "-".

试图在数组中混搭文档是一个坏主意.还需要遵守 16MB 的 BSON 大小限制,将结果混合在一起(特别是如果您试图保留文档内容)最终肯定会在响应中被超出.

Trying to mash up documents within arrays is a bad idea. There is a BSON size limit of 16MB to be respected also, from which mashing together results ( especially if you are trying to keep document content ) is most certainly going to end up being exceeded in the response.

对于从此类查询中获取结果的总数"这样简单的事情,只需总结特定方面类型的元素即可.或者只是向 .count() 操作发出相同的查询参数:

For something as simple as then getting the "overall count" of results from such a query, then just sum up the elements of a particular facet type. Or just issue your same query arguments to a .count() operation:

User.count({ "facets": { "$in": ["genre:Student"] } },function(err,count) {

});

正如这里所说,特别是在实现结果的分页"时,获取结果计数"、分面计数"和实际结果页面"的角色都委托给服务器的单独"查询.

As said here, particularly when implementing "paging" of results, then the roles of getting "Result Count", "Facet Counts" and the actual "Page of Results" are all delegated to "separate" queries to the server.

将这些查询中的每一个并行提交到服务器,然后组合一个结构以提供给您的模板或应用程序,看起来很像来自提供这种响应的搜索引擎产品之一的分面搜索结果,这并没有错.

There is nothing wrong with submitting each of those queries to the server in parallel and then combining a structure to feed to your template or application looking much like the faceted search result from one of the search engine products that offers this kind of response.

所以在您的文档中放一些东西来在一个地方标记方面.一个标记化字符串数组非常适合此目的.它也适用于查询表单,例如 $in$all 用于构面选择组合中的或"或与"条件.

So put something in your document to mark the facets in a single place. An array of tokenized strings works well for this purpose. It also works well with query forms such as $in and $all for either "or" or "and" conditions on facet selection combinations.

不要仅仅为了匹配某些感知的层次结构而尝试混合结果或嵌套添加,而是遍历接收到的结果并在标记中使用简单的模式.

Don't try and mash results or nest additions just to match some perceived hierarchical structure, but rather traverse the results received and use simple patterns in the tokens. It's very simple to

将内容的分页查询作为对方面或总体计数的单独查询运行.试图将所有内容推送到数组中,然后仅仅为了获得计数而限制它是没有意义的.这同样适用于 RDBMS 解决方案来做同样的事情,其中​​分页结果计数和当前页面是单独的查询操作.

Run paged queries for the content as separate queries to either facets or overall counts. Trying to push all content in arrays and then limit out just to get counts does not make sense. The same would apply to a RDBMS solution to do the same thing, where paging result counts and the current page are separate query operations.

MongoDB 博客上有更多关于 使用 MongoDB 进行分面搜索的信息,还解释了一些其他选项.还有一些关于使用 mongoconnector 或其他方法与外部搜索解决方案集成的文章.

There is more information written on the MongoDB Blog about Faceted Search with MongoDB that also explains some other options. There are also articles on integration with external search solutions using mongoconnector or other approaches.

这篇关于MongoDB 查询优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆