弹性绳索MongoDB用于过滤应用程序 [英] elasticsearch v.s. MongoDB for filtering application

查看:98
本文介绍了弹性绳索MongoDB用于过滤应用程序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题是在深入了解实验和实施细节之前先做一个架构选择。这是关于弹性搜索的可扩展性和性能方面的适用性。 MongoDB有一个特定的目的。

This question is about making an architectural choice prior to delving into the details of experimentation and implementation. It's about the suitability, in scalability and performance terms, of elasticsearch v.s. MongoDB, for a somewhat specific purpose.

假设两者都存储具有字段和值的数据对象,并允许查询对象的主体。所以大概根据所选择的ad-hoc字段过滤掉对象的子集,是适合这两者的。

Hypothetically both store data objects that have fields and values, and allow querying that body of objects. So presumably filtering out subsets of the objects according to fields selected ad-hoc, is something fit for both.

我的应用程序将围绕根据条件选择对象。
它将通过多个单个字段同时筛选对象,换句话说,其查询过滤条件通常包含1到5个字段之间的任何地方,在某些情况下可能更多。而作为过滤器选择的字段将是大量字段的子集。现在显示了大约20个字段名称,每个查询都是尝试通过20个字段中的几个字段过滤对象(可以存在少于或者20个整体字段名称,我只是使用这个数字来显示比例字段到每个离散查询中用作过滤器的字段)。过滤可以通过所选择的场的存在,以及场值,例如,滤除具有字段A的对象,其字段B在x和y之间,其字段C等于w。

My application will revolve around selecting objects according to criteria. It would select objects by filtering simultaneously by more than a single field, put differently, its query filtering criteria would typically comprise anywhere between 1 and 5 fields, maybe more in some cases. Whereas the fields chosen as filters would be a subset of a much larger amount of fields. Picture some 20 field names existing, and each query is an attempt to filter the objects by few fields out of those overall 20 fields (It can be less or more than 20 overall field names existing, I just used this number to demonstrate the ratio of fields to fields used as filters in every discrete query). The filtering can be by the existence of the chosen fields, as well as by the field values, e.g. filtering out objects that have field A, and their field B is between x and y, and their field C is equal to w.

我的应用程序将不断地进行这种过滤,而在任何时候用于过滤哪些字段将不会有任何东西或很少的常量。也许在弹性搜索索引中需要定义,但即使没有索引,速度与MongoDB的速度一致。

My application will be continuously doing this sort of filtering, whereas there would be nothing or very little constant in terms of which fields are used for the filtering at any moment. Perhaps in elasticsearch indexes need to be defined, but maybe even without indexes speed is at par with that of MongoDB.

根据进入商店的数据,没有关于该数据的特殊细节。插入后对象几乎不会更改。也许旧对象需要删除,我想假设两个数据存储支持在内部或应用程序查询中过期删除内容。 (不太频繁,符合某个查询的对象也需要删除)。

As per the data getting into the store, there are no special details about that.. the objects would be almost never changed after having been inserted. Perhaps old objects would need to be dropped, I'd like to assume both data stores support expire deleting stuff internally or by an application made query. (Less frequently, objects that fit a certain query would need to be dropped as well).

你觉得怎么样?
你有没有尝试过这个方面?

What do you think? And, have you experimented this aspect?

我对这两种数据存储的性能和可扩展性感兴趣的任务。这是一种建筑设计问题,并且欢迎详细的商店特定选项或查询基石,使其成为良好的架构,作为一个充分思考的建议的示范。

I am interested in the performance and the scalability of it, of each of the two data stores, for this kind of task. This is the sort of an architectural desing question, and details of store-specific options or query cornerstones that should make it well architected are welcome as a demonstration of a fully thought-out suggestion.

谢谢!

推荐答案

首先,在这里有一个重要的区别:MongoDB是一个通用数据库, Elasticsearch是由Lucene支持的分布式文本搜索引擎。人们一直在说使用Elasticsearch作为通用数据库,但是知道它不是它的原始设计。我认为,通用的NoSQL数据库和搜索引擎是要进行整合的,但是从现在来看,两者来自两个非常不同的阵营。

First off, there is an important distinction to make here: MongoDB is a general purpose database, Elasticsearch is a distributed text search engine backed by Lucene. People have been talking about using Elasticsearch as a general purpose database but know that it was not its' original design. I think that general purpose NoSQL databases and search engines are headed for consolidation but as it stands, the two come from two very different camps.

我们使用MongoDB和Elasticsearch在我公司我们将数据存储在MongoDB中,并使用Elasticsearch作为其全文搜索功能。我们只发送一个我们需要查询的mongo数据字段的一个子集。我们的用例与您的不同之处在于,我们的Mongo数据一直在变化:记录或记录的一部分字段可以每天更新多次,这可以要求将该记录重新索引到弹性。因为这个原因,使用弹性作为唯一的数据存储对我们来说不是一个很好的选择,因为我们无法更新选择的字段;我们需要重新整理索引文件。这不是一个弹性的限制,这就是Lucene的工作原理,底层的搜索引擎是弹性的。在你的情况下,记录不会被改变一旦存储就可以使你不必做出选择。话虽如此,如果数据安全是一个问题,我会考虑使用Elasticsearch作为数据的唯一存储机制。它可能会在某个时候到达那里,但我不确定它是否存在。

We are using both MongoDB and Elasticsearch in my company. We store our data in MongoDB and use Elasticsearch exclusively for its' full-text search capabilities. We only send a subset of the mongo data fields that we need to query to elastic. Our use case differs from yours in that our Mongo data changes all the time: a record, or a subset of the fields of a record, can be updated several times a day and this can call for re-indexing of that record to elastic. For that reason alone, using elastic as the sole data store is not a good option for us, as we can't update select fields; we would need to re-index a document in its' entirety. This is not an elastic limitation, this is how Lucene works, the underlying search engine behind elastic. In your case, the fact that records won't be changed once stored saves you from having to make that choice. Having said that, if data safety is a concern, I would think twice about using Elasticsearch as the only storage mechanism for your data. It may get there at some point but I'm not sure it's there yet.

在速度方面,不仅Elastic / Lucene与查询速度Mongo在您的情况下,在任何时候使用哪些字段用于非常少的字段,可能会更快几个数量级,特别是数据集变大时。差异在于基础查询实现:

In terms of speed, not only is Elastic/Lucene on par with the querying speed of Mongo, in your case where there is "very little constant in terms of which fields are used for the filtering at any moment", it could be orders of magnitude faster, especially as the datasets become larger. The difference lies in the underlying query implementations:


  • 弹性/ Lucene使用矢量空间模型反向索引 for 信息检索,这是比较记录相似性与查询的高效方法。当您查询Elastic / Lucene时,它已经知道了答案;它的大部分工作就在于为最符合您的查询条件的用户排列结果。这是一个重要的一点:搜索引擎,而不是数据库,不能保证你确切的结果;他们按照他们对您的查询的接近程度进行排名。大多数时候,结果都是接近于准确的。

  • Mongo的方法是一个更通用的数据存储;它将JSON文档相互比较。你可以通过一切手段获得很好的表现,但是您需要仔细地制作索引以匹配您将要运行的查询。具体来说,如果您有多个要查询的字段,则需要仔细制作复合键,以便它们尽可能快地减少将被查询的数据集。例如。您的第一个键应该过滤掉大部分的数据集,你的第二个应该进一步过滤掉剩下的内容,依此类推。如果您的查询与定义的索引中的密钥和这些密钥的顺序不匹配,您的性能将会下降。另一方面,Mongo是一个真正的数据库,所以如果准确性是你需要的,那么它将给出的答案将被发现。

  • Elastic/Lucene use the Vector Space Model and inverted indexes for Information Retrieval, which are highly efficient ways of comparing record similarity against a query. When you query Elastic/Lucene, it already knows the answer; most of its' work lies in ranking the results for you by the most likely ones to match your query terms. This is an important point: search engines, as opposed to databases, can't guarantee you exact results; they rank results by how close they get to your query. It just so happens that most of the times, the results are close to exact.
  • Mongo's approach is that of a more general purpose data store; it compares JSON documents against one another. You can get great performance out of it by all means, but you need to carefully craft your indexes to match the queries you will be running. Specifically, if you have multiple fields by which you will query, you need to carefully craft your compound keys so that they reduce the dataset that will be queried as fast as possible. E.g. your first key should filter down the majority of your dataset, your second should further filter down what left, and so on and so forth. If your queries don't match the keys and the order of those keys in the defined indexes, your performance will drop quite a bit. On the other hand, Mongo is a true database, so if accuracy is what what you need, the answers it will give will be spot on.

Elastic具有内置的TTL功能,用于过期旧记录。 Mongo刚刚介绍了2.2版本,我认为。

For expiring old records, Elastic has a built in TTL feature. Mongo just introduced it as of version 2.2 I think.

由于我不知道您的其他要求,如预期的数据大小,交易,准确性或您的过滤器的外观喜欢,很难做出任何具体的建议。希望在这里有足够的让你开始。

Since I don't know your other requirements such as expected data size, transactions, accuracy or what your filters will look like, it's hard to make any specific recommendations. Hopefully, there is enough here to get you started.

这篇关于弹性绳索MongoDB用于过滤应用程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆