使用mgo在MongoDB中高效分页 [英] Efficient paging in MongoDB using mgo

查看:605
本文介绍了使用mgo在MongoDB中高效分页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我搜索了一下,发现没有Go的解决方案,没有或没有使用 mgo.v2 ,不在StackOverflow上,也不在任何其他站点上。本Q& A本着知识共享/记录的精神






假设我们有个用户 b
$ b

   类型User struct {
ID bson.ObjectId`bson:_ id`
名称字符串`bson:name`
国家字符串`bson:country`
}

我们想根据一些标准对用户进行排序和列表,但由于预期的长时间结果列表。



为了实现一些查询的结果分页,MongoDB和 mgo.v2 驱动程序包内置了 Query.Skip() Query.Limit() ,例如:

  session,err:= mgo.Dial(url)//获取Mongo会话,处理错误! 

c:= session.DB()。C(users)
q:= c.Find(bson.M {country:USA})。Sort (10)

//得到第n个页面:
q = q.Skip((n-1)* 10)

var users [] * User
err = q.All(& users)



然而,如果页面数量增加,这个速度会变慢,因为MongoDB不能在结果中神奇地跳到x th 文档,它必须迭代所有结果文档并省略(不返回)需要跳过的第一个 x



MongoDB提供了正确的解决方案:如果查询对索引进行操作(它必须处理索引), <$ c $可以使用c> cursor.min() 指定第一个索引条目来开始列出结果。



这个堆栈溢出答案显示了如何使用mongo客户端完成它:

注意:上面查询所需的索引是:

$ p $ db $ use $。$ db
国家:1,
名称:1,
_id:1
}

有一个问题: mgo.v2 包不支持指定 min()



我们如何实现使用MongoDB的 cursor.min()特性使用 mgo.v2 驱动程序?

解决方案

=https://godoc.org/gopkg.in/mgo.v2 =noreferrer> mgo.v2 驱动程序不提供API调用指定 cursor.min()



但有一个解决方案。 mgo.Database type提供了一个 Database.Run() 方法来运行任何MongoDB命令。可在此处找到可用命令及其文档:数据库命令



从MongoDB 3.2开始,新的 find 命令可用于执行查询,并且它支持指定 min 参数,该参数表示第一个索引条目开始列出结果。



好。我们需要做的是在每个批次(页面的文档)从查询结果的最后一个文档生成 min 文档之后,该文档必须包含索引条目的值用于执行查询,然后通过在执行查询之前设置该最小索引条目来获取下一批(下一页的文档)。



此索引条目-let从现在开始称它为 - 可以编码为字符串,并与结果一起发送给客户端,当客户端需要下一页时,他发回游标,表示他希望在该游标之后开始结果。



手动执行此操作(硬命令)



要执行的命令可以采用不同的形式,但命令名称( find )必须是编组结果中的第一位,因此我们将使用 bson.D (保留顺序与 bson.M ):

 限制:= 10 
cmd:= bson.D {
{名称:find,值:users},
{名称:filter,值:bson.M {country:USA}},
{名称: ,值:[] bson.D {
{名称:name,值:1},
{名称:_id,值:1},
},$ b $ {Name:limit,Value:limit},
{Name:batchSize,Value:limit},
{Name:singleBatch,Value:true},
}
如果min!= nil {
// min是包含的,必须先跳过(这是前一个)
cmd = append(cmd,
bson.DocElem {Name: skip,Value:1},
bson.DocElem {Name:min,Value:min},

}

使用 Database.Run()执行MongoDB 查找 可以使用以下类型捕获:

  var res结构{
OK int`bson:ok`
WaitedMS int`bson:waitedMS
Cursor struct {
ID interface {} bson:id`
NS string `bson:ns`
FirstBatch [] bson.Raw`bson:firstBatch`
}`bson:cursor`
}

db := session.DB()
if err:= db.Run(cmd,& res); err!= nil {
//处理错误(中止)
}

我们现在得到了结果,但是在 [] bson.Raw 类型的片段中。但是我们希望它在 [] * User 类型的片段中。这是 Collection.NewIter() 来得方便。它可以将 [] bson.Raw 类型的值转换(解组)为我们通常传递给 Query.All() Iter.All() 。好。让我们看看它:

  firstBatch:= res.Cursor.FirstBatch 
var users [] * User
err = db.C(users)。NewIter(nil,firstBatch,0,nil).All(& users)

我们现在有下一页的用户。只剩下一件事:如果我们需要它,生成用于获取后续页面的游标:

  if len(users )> 0 {
lastUser:= users [len(users)-1]
cursorData:= [] bson.D {
{Name:country,Value:lastUser.Country},
{Name:name,Value:lastUser.Name},
{Name:_id,Value:lastUser.ID},
}
} else {
//找不到更多的用户,使用最后一个游标
}

这很好,但我们如何将 cursorData 转换为 string ,反之亦然?我们可能会使用 bson.Marshal() bson.Unmarshal() 结合base64编码;使用 base64.RawURLEncoding 会为我们提供一个web安全的游标字符串,可以将其添加到URL查询中而不会转义。



以下是一个示例实现:

  // CreateCursor从指定的字段中返回一个web安全的游标字符串。 
//返回的游标字符串可安全地包含在URL查询中而不会转义。
func CreateCursor(cursorData bson.D)(string,error){
// bson.Marshal()永远不会返回错误,所以我跳过检查并提前返回
//(但是我如果它发生的话,会返回错误)
data,err:= bson.Marshal(cursorData)
return base64.RawURLEncoding.EncodeToString(data),err
}

// ParseCursor解析游标字符串并返回游标数据。
func ParseCursor(c string)(cursorData bson.D,err error){
var data [] byte
if data,err = base64.RawURLEncoding.DecodeString(c); err!= nil {
return
}

err = bson.Unmarshal(data& cursorData)
return
}

我们终于有了我们的高效,但不是那么短的MongoDB mgo 分页功能。阅读...



使用 github.com/icza/minquery (简单方式)



手动方式相当长;它可以做成 general 自动。这是 github.com/icza/minquery 进入图片(披露:我是作者)。它提供了一个包装来配置和执行一个MongoDB find 命令,允许你指定一个游标,并且在执行完查询之后,它会返回新的游标以用于查询下一批结果。包装是 MinQuery 类型,它类似于 mgo.Query ,但它支持通过 MinQuery.Cursor()方法指定MongoDB的 min



使用 minquery 的上述解决方案如下所示:

<$ p $ (),users,bson.M {country:USA})。
Sort(name,_id)。Limit(10)
//如果这不是第一页,请设置光标:
// getLastCursor()表示您的逻辑获取最后的光标。
if cursor:= getLastCursor(); cursor!={
q = q.Cursor(cursor)
}

var users [] * User
newCursor,err:= q.All(& amp; ;用户,国家,名称,_id)

就是这样。 newCursor 是用于获取下一批的光标。



注意#1:当调用 MinQuery.All()时,必须提供游标字段的名称,这将用于构建游标数据(最终是游标字符串)注意#2:如果您正在检索部分结果(使用 MinQuery.Select()),即使您不打算直接使用它们,也必须包含所有属于游标(索引条目)的字段,否则 MinQuery.All() 不会包含游标字段的所有值,因此它将无法创建正确的游标值。



查看这里包含 minquery 的文档: https:// godoc.org/github.com/icza/minquery ,它相当短,希望干净。


I've searched and found no Go solution to the problem, not with or without using mgo.v2, not on StackOverflow and not on any other site. This Q&A is in the spirit of knowledge sharing / documenting.


Let's say we have a users collection in MongoDB modeled with this Go struct:

type User struct {
    ID      bson.ObjectId `bson:"_id"`
    Name    string        `bson:"name"`
    Country string        `bson:"country"`
}

We want to sort and list users based on some criteria, but have paging implemented due to the expected long result list.

To achieve paging of the results of some query, MongoDB and the mgo.v2 driver package has built-in support in the form of Query.Skip() and Query.Limit(), e.g.:

session, err := mgo.Dial(url) // Acquire Mongo session, handle error!

c := session.DB("").C("users")
q := c.Find(bson.M{"country" : "USA"}).Sort("name", "_id").Limit(10)

// To get the nth page:
q = q.Skip((n-1)*10)

var users []*User
err = q.All(&users)

This however becomes slow if the page number increases, as MongoDB can't just "magically" jump to the xth document in the result, it has to iterate over all the result documents and omit (not return) the first x that need to be skipped.

MongoDB provides the right solution: If the query operates on an index (it has to work on an index), cursor.min() can be used to specify the first index entry to start listing results from.

This Stack Overflow answer shows how it can be done using a mongo client: How to do pagination using range queries in MongoDB?

Note: the required index for the above query would be:

db.users.createIndex(
    {
        country: 1,
        name: 1,
        _id: 1
    }
)

There is one problem though: the mgo.v2 package has no support specifying this min().

How can we achieve efficient paging that uses MongoDB's cursor.min() feature using the mgo.v2 driver?

解决方案

Unfortunately the mgo.v2 driver does not provide API calls to specify cursor.min().

But there is a solution. The mgo.Database type provides a Database.Run() method to run any MongoDB commands. The available commands and their documentation can be found here: Database commands

Starting with MongoDB 3.2, a new find command is available which can be used to execute queries, and it supports specifying the min argument that denotes the first index entry to start listing results from.

Good. What we need to do is after each batch (documents of a page) generate the min document from the last document of the query result, which must contain the values of the index entry that was used to execute the query, and then the next batch (the documents of the next page) can be acquired by setting this min index entry prior to executing the query.

This index entry –let's call it cursor from now on– may be encoded to a string and sent to the client along with the results, and when the client wants the next page, he sends back the cursor saying he wants results starting after this cursor.

Doing it manually (the "hard" way)

The command to be executed can be in different forms, but the command name (find) must be first in the marshaled result, so we'll use bson.D (which preserves order in contrast to bson.M):

limit := 10
cmd := bson.D{
    {Name: "find", Value: "users"},
    {Name: "filter", Value: bson.M{"country": "USA"}},
    {Name: "sort", Value: []bson.D{
        {Name: "name", Value: 1},
        {Name: "_id", Value: 1},
    },
    {Name: "limit", Value: limit},
    {Name: "batchSize", Value: limit},
    {Name: "singleBatch", Value: true},
}
if min != nil {
    // min is inclusive, must skip first (which is the previous last)
    cmd = append(cmd,
        bson.DocElem{Name: "skip", Value: 1},
        bson.DocElem{Name: "min", Value: min},
    )
}

The result of executing a MongoDB find command with Database.Run() can be captured with the following type:

var res struct {
    OK       int `bson:"ok"`
    WaitedMS int `bson:"waitedMS"`
    Cursor   struct {
        ID         interface{} `bson:"id"`
        NS         string      `bson:"ns"`
        FirstBatch []bson.Raw  `bson:"firstBatch"`
    } `bson:"cursor"`
}

db := session.DB("")
if err := db.Run(cmd, &res); err != nil {
    // Handle error (abort)
}

We now have the results, but in a slice of type []bson.Raw. But we want it in a slice of type []*User. This is where Collection.NewIter() comes handy. It can transform (unmarshal) a value of type []bson.Raw into any type we usually pass to Query.All() or Iter.All(). Good. Let's see it:

firstBatch := res.Cursor.FirstBatch
var users []*User
err = db.C("users").NewIter(nil, firstBatch, 0, nil).All(&users)

We now have the users of the next page. Only one thing left: generating the cursor to be used to get the subsequent page should we ever need it:

if len(users) > 0 {
    lastUser := users[len(users)-1]
    cursorData := []bson.D{
        {Name: "country", Value: lastUser.Country},
        {Name: "name", Value: lastUser.Name},
        {Name: "_id", Value: lastUser.ID},
    }
} else {
    // No more users found, use the last cursor
}

This is all good, but how do we convert a cursorData to string and vice versa? We may use bson.Marshal() and bson.Unmarshal() combined with base64 encoding; the use of base64.RawURLEncoding will give us a web-safe cursor string, one that can be added to URL queries without escaping.

Here's an example implementation:

// CreateCursor returns a web-safe cursor string from the specified fields.
// The returned cursor string is safe to include in URL queries without escaping.
func CreateCursor(cursorData bson.D) (string, error) {
    // bson.Marshal() never returns error, so I skip a check and early return
    // (but I do return the error if it would ever happen)
    data, err := bson.Marshal(cursorData)
    return base64.RawURLEncoding.EncodeToString(data), err
}

// ParseCursor parses the cursor string and returns the cursor data.
func ParseCursor(c string) (cursorData bson.D, err error) {
    var data []byte
    if data, err = base64.RawURLEncoding.DecodeString(c); err != nil {
        return
    }

    err = bson.Unmarshal(data, &cursorData)
    return
}

And we finally have our efficient, but not so short MongoDB mgo paging functionality. Read on...

Using github.com/icza/minquery (the "easy" way)

The manual way is quite lengthy; it can be made general and automated. This is where github.com/icza/minquery comes into the picture (disclosure: I'm the author). It provides a wrapper to configure and execute a MongoDB find command, allowing you to specify a cursor, and after executing the query, it gives you back the new cursor to be used to query the next batch of results. The wrapper is the MinQuery type which is very similar to mgo.Query but it supports specifying MongoDB's min via the MinQuery.Cursor() method.

The above solution using minquery looks like this:

q := minquery.New(session.DB(""), "users", bson.M{"country" : "USA"}).
    Sort("name", "_id").Limit(10)
// If this is not the first page, set cursor:
// getLastCursor() represents your logic how you acquire the last cursor.
if cursor := getLastCursor(); cursor != "" {
    q = q.Cursor(cursor)
}

var users []*User
newCursor, err := q.All(&users, "country", "name", "_id")

And that's all. newCursor is the cursor to be used to fetch the next batch.

Note #1: When calling MinQuery.All(), you have to provide the names of the cursor fields, this will be used to build the cursor data (and ultimately the cursor string) from.

Note #2: If you're retrieving partial results (by using MinQuery.Select()), you have to include all the fields that are part of the cursor (the index entry) even if you don't intend to use them directly, else MinQuery.All() will not have all the values of the cursor fields, and so it will not be able to create the proper cursor value.

Check out the package doc of minquery here: https://godoc.org/github.com/icza/minquery, it is rather short and hopefully clean.

这篇关于使用mgo在MongoDB中高效分页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆