MongoDB:在给定行之前和之后顺序返回行? [英] MongoDB: returning rows sequentially before and after a given row?

查看:101
本文介绍了MongoDB:在给定行之前和之后顺序返回行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在MongoDB中,给定find()运算符返回一组行的游标,这是一种惯用且省时的方式来返回上下文"行,即按顺序在每一行之前和/或之后的行在集合中?

In MongoDB, given a find() operator that returns a cursor for a set of rows, what is an idiomatic and time-efficient manner in which to return "context" rows, i.e. rows sequentially before and/or after each row in the set?

对我来说,解释此概念的最简单方法是使用 ack ,它支持上下文搜索.给定一个文件:

For me the easiest way to explain this concept is using ack, which supports context searching. Given a file:

line 1
line 2
line 3
line 4
line 5
line 6
line 7
line 8

这是ack的输出:

C:\temp>ack.pl -C 2 "line 4" test.txt
line 2
line 3
line 4
line 5
line 6

我将日志数据存储在MongoDB集合中,每行一个文档.每个日志都将每个标记为关键字的日志记录下来,并对这些关键字进行索引,这使我可以进行便宜的全文搜索.

I am storing log data in a MongoDB collection, one document per row. Each log each tokenized into keywords and these keywords are indexed, which gives me cheap-ish full-text searching.

我执行沼泽标准:

collection.find({keywords: {'$all': ['key1', 'key2']}}, {}).sort({datetime: -1});

并获得一个光标.在此阶段,不添加任何其他字段的情况下,获取上下文的方法是什么?我认为流程类似于:

and get a cursor. At this stage, without adding any additional fields, what is the approach for getting context? I think the flow is something like:

  • 对于光标中的每一行:
    • 获取_id字段,存储到x中.
    • 执行:collection.find({_ id:{'$ gt':x}}).limit(N)
      • 从每个游标中获取结果.
      • For each row in the cursor:
        • Get the _id field, store into x.
        • execute: collection.find({_id: {'$gt': x}}).limit(N)
          • Get the results from each of these cursors.
          • 从每个游标中获取结果.

          对于具有R行的结果集,需要2R + 1查询.

          For a result set with R rows this requires 2R+1 queries.

          但是,我认为我可以权衡时间.是否可行的替代方案是在后台用上下文_id更新每一行?对于当前具有字段的给定行:

          However, I think I can trade off space for time. Is a feasible alternative to update each row with its context _id's in the background? For a given row that currently has fields:

          _id, contents, keywords
          

          我要添加一个附加字段:

          I would add an additional field:

          _id, contents, keywords, context_ids
          

          ,然后我可以在随后的搜索中以某种方式使用这些context_ids吗?我对MongoDB MapReduce完全不熟悉,但是那也可以出现在图片中吗?

          and then in a subsequent search I could, somehow, use these context_ids, I think? I'm not at all familiar with MongoDB MapReduce yet, but can that come into the picture as well?

          我认为最直接的方法是将实际上下文行的全文存储在每一行中,但这对我来说似乎有点粗糙.明显的优点是,单个查询可以返回我需要的上下文.

          I think the most direct approach is to store the full-text of the actual context rows in each row, but this seems a bit crude to me. The clear advantage is that a single query could return the context I need.

          我感谢接受问题范围的所有答案.我意识到我可以带外使用Lucene或真正的全文本搜索引擎,但是我想体验一下MongoDB的优势和功能,因此,我希望了解MongoDB的特定答案.谢谢!

          I appreciate any and all answers that accept the scope of the question. I realise I could use Lucene or a real full-text search engine out-of-band but I'm trying to feel out the edges and capabilities of MongoDB so I'd appreciate MongoDB-specific answers. Thanks!

          推荐答案

          我认为您存储context_ids或类似方法的方法可能是最好的选择.如果您能够存储所需的所有上下文行的context_ids(假设它是固定大小的上下文,例如前后5行),则可以查询以下所有行:使用$in的上下文:

          I think your approach of storing context_ids, or something like it, might be the best option. If you are able to store the context_ids of all the rows of context you will need (this assumes that it's a fixed-size amount of context -- say 5 lines before and after), then you can query for all the lines of context using $in:

          # pseudocode
          for each matching row:
              context_rows = db.logs.find({_id: {$in: row['context_ids']}}).sort({_id: 1})
              row_with_context = [context_rows_before_row] + row + [context_rows_after_row]
          

          我想知道了解上下文行的集合,尤其是要考虑的行之后的 可能很困难,因为任何给定行之后的行都不一定存在.

          I imagine that knowing the set of context rows -- particularly the rows after the row you're considering, can be difficult, since the rows after any given row won't necessarily exist yet.

          可以避免此问题的替代方法(但仍需要固定的,已知的提前量的上下文)只是在相关行之前(即,在插入时)存储上下文的第一行的_id ,您可以缓冲前N行,其中N是上下文的数量)-调用first_context_id-然后像这样查询:

          An alternative, which will avoid this problem (but still requires a fixed, known-ahead-of-time amount of context) is just to store the _id of the first line of context before the line in question (i.e. when inserting, you can buffer the previous N lines where N is the amount of context) -- call this first_context_id -- and then query like:

          # pseudocode
          for each matching row:
              rows_with_context = db.logs.find({_id: {$gte: row['first_context_id']}}).sort({_id: 1}).limit(N * 2 + 1)
          

          这还可以简化您的应用程序逻辑,因为您无需将上下文与相关行重新组合在一起,此查询将返回匹配的行和上下文的行.

          This may also simplify your application logic, as you don't need to reassemble the context with the row in question, this query will return both the matched row and the rows of context.

          这篇关于MongoDB:在给定行之前和之后顺序返回行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆