MongoDB查询优化 [英] MongoDB queries optimisation

查看:156
本文介绍了MongoDB查询优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望从我的用户模型中检索如下所示的几个信息:

I wish to retrieve several information from my User model that looks like this:

var userSchema = new mongoose.Schema({
  email: { type: String, unique: true, lowercase: true },
  password: String,

  created_at: Date,
  updated_at: Date,

  genre : { type: String, enum: ['Teacher', 'Student', 'Guest'] },
  role : { type: String, enum: ['user', 'admin'], default: 'user' },
  active : { type: Boolean, default: false },

  profile: {
    name : { type: String, default: '' },
    headline : { type: String, default: '' },
    description : { type: String, default: '' },
    gender : { type: String, default: '' },
    ethnicity : { type: String, default: '' },
    age : { type: String, default: '' }
  },

  contacts : {
    email : { type: String, default: '' },
    phone : { type: String, default: '' },
    website : { type: String, default: '' }
  },

  location : {
    formattedAddress : { type: String, default: '' },
    country : { type: String, default: '' },
    countryCode : { type: String, default: '' },
    state : { type: String, default: '' },
    city : { type: String, default: '' },
    postcode : { type: String, default: '' },
    lat : { type: String, default: '' },
    lng : { type: String, default: '' }
  }
});

在首页我有一个位置的过滤器,您可以在其中浏览用户国家或城市。

In Homepage I have a filter for location where you can browse Users from Country or City.

所有字段还包含其中的用户数:

All the fields contains also the number of users in there:

United Kingdom
  All Cities (300)
  London (150)
  Liverpool (80)
  Manchester (70)
France
  All Cities (50)
  Paris (30)
  Lille (20)
Nederland
  All Cities (10)
  Amsterdam (10)
Etc...

这在主页上,我还有学生和教师页面,我希望仅提供信息关于这些国家和城市中有多少老师......

This in the Homepage, then I have also the Students and Teachers pages where I wish to have information only about how many teachers there are in those Countries and Cities...

我要做的是创建一个查询MongoDB以检索所有这些信息单个查询。

What I'm trying to do is to create a query to MongoDB to retrieve all these information with a single query.

目前查询如下所示:

User.aggregate([
    { 
      $group: { 
        _id: { city: '$location.city', country: '$location.country', genre: '$genre' },
        count: { $sum: 1 }
      }
    },
    {
      $group: { 
        _id: '$_id.country',
        count: { $sum: '$count' },
        cities: { 
          $push: { 
            city: '$_id.city', 
            count: '$count'
          }
        },
        genres: {
          $push: {
            genre: '$_id.genre',
            count: '$count'
          }
        }
      }
    }
  ], function(err, results) {
    if (err) return next();
    res.json({ 
        res: results
    });
  });

问题是我不知道如何获得我需要的所有信息。

The problem is that I don't know how to get all the information I need.


  • 我不知道如何获得每个国家/地区的总用户数。

  • 我有每个国家/地区的用户长度。

  • 我有每个城市的用户长度。

  • 我不知道如何获得相同但是特定类型。

  • I don't know how to get the length of the total users in every Country.
  • I have the users length for each Country.
  • I have the users length for each city.
  • I don't know how to get the same but for specific genre.

是否可以在Mongo中使用单个查询获取所有这些信息?

否则:

向Mongo创建两个3个不同的请求,如下所示:

Creating few promises with 2, 3 different requests to Mongo like this:

getSomething
.then(getSomethingElse)
.then(getSomethingElseAgain)
.done

我确信每次指定数据都会更容易存储,但是:当有超过指定的数据时,它是否有利于性能数据库中有5000/10000用户?

I'm sure it would be easier storing every time specified data but: is it good for performance when there are more than 5000 / 10000 users in the DB?

抱歉,我还在进程中我认为这些对于理解MongoDB性能/优化至关重要。

Sorry but I'm still in the process of learning and I think these things are crucial to understand MongoDB performance / optimisation.

谢谢

推荐答案

您想要的是分面搜索结果,其中包含有关匹配的统计信息当前结果集中的术语。随后,虽然有些产品出现在单个响应中完成所有工作,但您必须考虑大多数通用存储引擎需要多个操作。

What you want is a "faceted search" result where you hold the statistics about the matched terms in the current result set. Subsequently, while there are products that "appear" to do all the work in a single response, you have to consider that most generic storage engines are going to need multiple operations.

使用MongoDB,您可以使用两个查询来获取结果本身,另一个查询可以获取构面信息。这将与 Solr 弹性搜索

With MongoDB you can use two queries to get the results themselves and another to get the facet information. This would give similar results to the faceted results available from dedicated search engine products like Solr or ElasticSearch.

但为了有效地执行此操作,您希望以可以的方式将其包含在文档中有效地使用。你想要的一个非常有效的形式是使用一组标记化数据:

But in order to do this effectively, you want to include this in your document in a way it can be used effectively. A very effective form for what you want is using an array of tokenized data:

 {
     "otherData": "something",
     "facets": [
         "country:UK",
         "city:London-UK",
         "genre:Student"
     ]
 }

因此factets是文档中的单个字段,而不是多个位置。这使得索引和查询变得非常容易。然后,您可以有效地汇总结果并获得每个方面的总数:

So "factets" is a single field in your document and not in multiple locations. This makes it very easy to index and query. Then you can effectively aggregate across your results and get the totals for each facet:

User.aggregate(
    [
        { "$unwind": "$facets" },
        { "$group": {
            "_id": "$facets",
            "count": { "$sum": 1 }
        }}
    ],
    function(err,results) {

    }
);

或者更理想地使用 $ match

User.aggregate(
    [
        { "$match": { "facets": { "$in": ["genre:student"] } } },
        { "$unwind": "$facets" },
        { "$group": {
            "_id": "$facets",
            "count": { "$sum": 1 }
        }}
    ],
    function(err,results) {

    }
);

最终给出如下回复:

{ "_id": "country:FR", "count": 50 },
{ "_id": "country:UK", "count": 300 },
{ "_id": "city:London-UK", "count": 150 },
{ "_id": "genre:Student": "count": 500 }

这样的结构很容易遍历和检查诸如离散的国家和属于的城市之类的东西country因为数据只是用连字符一致地分隔 - 。

Such a structure is easy to traverse and inspect for things like the discrete "country" and the "city" that belongs to a "country" as that data is just separated consistently by a hyphen "-".

尝试将数组中的文档混搭是一个坏主意。还要遵守16MB的BSON大小限制,从而将结果混合在一起(特别是如果你试图保留文档内容)最终肯定会在响应中被超出。

Trying to mash up documents within arrays is a bad idea. There is a BSON size limit of 16MB to be respected also, from which mashing together results ( especially if you are trying to keep document content ) is most certainly going to end up being exceeded in the response.

对于像这样的查询获得结果的总体计数这样简单的事情,然后只需总结特定构面类型的元素。或者只是向 .count()操作发出相同的查询参数:

For something as simple as then getting the "overall count" of results from such a query, then just sum up the elements of a particular facet type. Or just issue your same query arguments to a .count() operation:

User.count({ "facets": { "$in": ["genre:Student"] } },function(err,count) {

});

如上所述,特别是在实施结果分页时,获得结果计数的作用,Facet Counts和实际的结果页面都被委托给服务器单独查询。

As said here, particularly when implementing "paging" of results, then the roles of getting "Result Count", "Facet Counts" and the actual "Page of Results" are all delegated to "separate" queries to the server.

提交每个查询都没有错并行查询到服务器,然后将结构组合到您的模板或应用程序,看起来很像提供此类响应的搜索引擎产品的分面搜索结果。

There is nothing wrong with submitting each of those queries to the server in parallel and then combining a structure to feed to your template or application looking much like the faceted search result from one of the search engine products that offers this kind of response.

因此,在文档中放置一些内容以在一个地方标记构面。标记化字符串数组适用于此目的。它也适用于查询表单,例如 $ in $

So put something in your document to mark the facets in a single place. An array of tokenized strings works well for this purpose. It also works well with query forms such as $in and $all for either "or" or "and" conditions on facet selection combinations.

不要尝和mash结果或嵌套添加只是为了匹配一些感知的层次结构,而是遍历收到的结果并使用令牌中的简单模式。

Don't try and mash results or nest additions just to match some perceived hierarchical structure, but rather traverse the results received and use simple patterns in the tokens. It's very simple to

将内容的分页查询作为单独的查询或整体计数运行,这非常简单。试图推送数组中的所有内容然后限制只是为了得到计数是没有意义的。这同样适用于RDBMS解决方案来做同样的事情,其中​​分页结果计数和当前页面是单独的查询操作。

Run paged queries for the content as separate queries to either facets or overall counts. Trying to push all content in arrays and then limit out just to get counts does not make sense. The same would apply to a RDBMS solution to do the same thing, where paging result counts and the current page are separate query operations.

在MongoDB上写的信息更多有关使用MongoDB进行分面搜索的博客,其中还介绍了其他一些选项。还有一些文章使用 mongoconnector 或其他方法与外部搜索解决方案集成。

There is more information written on the MongoDB Blog about Faceted Search with MongoDB that also explains some other options. There are also articles on integration with external search solutions using mongoconnector or other approaches.

这篇关于MongoDB查询优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆