最好的NoSQL用于在多个索引/字段上进行过滤 [英] Best NoSQL for filtering on multiple indexes/fields

查看:228
本文介绍了最好的NoSQL用于在多个索引/字段上进行过滤的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于需要查询的数据大小以及需要在多个节点上进行扩展的能力,所以我正在考虑使用某种类型的NoSQL数据库。
我一直在研究大量的NoSQL产品,但是还不能决定什么是最好的选择,将为我们的数据结构提供最佳性能,可扩展性和功能。

Because of the size of the data that needs to be queried and ability to scale as needed on multiple nodes, I am considering using some type of NoSQL db. I have been researching numerous NoSQL offerings but can't yet decide on what would be the best option which would provide best performance, scalability and features for our data structure.

数据结构模型是产品目录,其中每个文档/集合包含该单个产品的某些属性和描述。属性将因产品而异,这就是为什么无模式的产品将是最好的。

Data structure model is of a product catalog where each document/set contains certain properties and descriptions for the that individual product. Properties would vary from product to product which is why schema-less offering would work the best.

样本结构将如

[
 {"name": "item name",
  "cost": 563.34,
  "category": "computer",
  "manufacturer: "sony",
.
.
.
 }
]

所以要求是我需要能够对记录集中的许多不同的数据集字段/索引进行过滤/查询,我可以在同一个查询中过滤和排除多个索引/字段。查询大部分是读取的,不会有太多的需要任何连接或关系类型的链接。

So requirement is that I need to be able to filter/query on many different data set fields/indexes in the record set, where I could filter on and exclude multiple indexes/fields in the same query. Queries will be mostly reads and there would not be much of a need for any joins or relationship type of linking.

我已经研究:弹性搜索,mongodb,OrientDB,Couchbase和Aerospike。

I have looked into: Elastic Search, mongodb, OrientDB, Couchbase and Aerospike.


  • 弹性搜索似乎是一个明显的选择,但是我在考虑性能和稳定性?

  • Aerospike似乎是真的很快,因为它大都在内存中,但它的过滤和搜索功能似乎没有能力。

你认为最适合我的用例的是什么?或者如果有其他建议的数据库,我应该研究。

What do you think best option would be for my use case? or if there any other recommended DBs that I should look into.

我知道最好的方法是用实际的实际使用情况来测试性能,但是我希望先把它缩小一点。

I know that best way is to test the performance with the actual real life use case, but I am hoping to first narrow it down little bit.

谢谢

推荐答案

这是一个流行问题的变种什么是最好的产品:)

This is a variant on the popular question "what is the best product" :)

一如既往:这取决于具体的用例和目标。数据库产品(如所有产品)始终是权衡的结果。因此,不存在最佳性能,可扩展性和功能的单一产品。但是,对于您的用例,有很多非常好的产品。

As always: this depends on your specific use case and goals. Database products (like all products) are always the result of trade-offs. So there does NOT exist a single product offering best performance, scalability and features. However there are many very good products for your use case.

由于您的问题是关于产品数据,我正在使用产品数据获取更多15年以上,它将尝试回答您的问题。

Because your question is about Product Data and I am working with Product Data for more than 15 years, it will try to answer your question.


  • 文档模型非常适合产品数据。所以对于除简单查找之外的所有用例,我将推荐一个文档存储

  • 如果您的用例涉及单个应用程序,则使用Java平台。我建议使用一个嵌入式数据库。这使事情变得更简单,并且具有很大的性能优势

  • 如果您需要分面搜索或其他先进的产品搜索,我建议您使用SOLR或弹性搜索

  • 如果您需要一个分布式系统,我建议您通过SOLR进行弹性搜索

  • 如果您需要基于评论或其他面向图形的算法的产品推荐,我建议使用OrientDB或ArangoDB(或Neo4J,但在这种情况下,这将是我的第二选择)

  • A document model is a perfect fit for Product Data. So for all use cases other than simple look up I would recommend a Document Store
  • If your use case concerns a single application and you are using the Java platform. I would recommend to use an embedded database. This makes things simpler and has a big performance advantage
  • If you need faceted search or other advance product search, i recommend you to use SOLR or Elastic Search
  • If you need a distributed system I recommend Elastic Search over SOLR
  • If you need Product recommendations based on reviews or other graph oriented algorithms, I recommend to use OrientDB or ArangoDB (or Neo4J, but in this case this would be my second choice)

我们正在生产中使用或在您所描述的用例的深度是

Products we are using in Production or evaluated in depth for the use case you describe are


  • SOLR和ES。两个非常精心设计的产品。 (也是ES)成熟稳定的产品

  • Neo4J。最成熟的图形数据库。巨大的优势IMO是他们使用的令人敬畏的查询语言。集成Lucene发动机。非常成熟和精心设计的产品。缺点是它不是文档图,而是属性(键值)图。也可以是昂贵的

  • MongoDB。我们与文件存储的第一次经验。非常好的产品。大优势:优秀的文档,(目前为止)最流行的NoSQL数据库

  • OrientDB和ArangoDB。两者都支持图形/文档范例。这是不太知名的产品,但非常强大。因为我们是一个基于Java的商店,我们的偏好是OrientDB。 OrientDB具有集成的Lucene引擎(虽然实现相当简单)。另一方面,ArangoDB具有非常好的文档和非常聪明高效的存储格式,最后AQL也非常好!

  • 性能:(测试使用11.43 mio文章和2.3 mio产品) 。所有产品都非常快,特别是SOLR和ES在这种用例中。 嵌入式 OrientDB也可以快速导入和简单查询。对于多面搜索,搜索服务器提供真正的快速性能!

  • 底线:我将去图形/文档存储和/或搜索服务器(SOLR或ES)。因为你提到过滤(我假设分面搜索)。搜索服务器是明显的首选

  • SOLR and ES. Both extremely well engineered products. Both (also ES) mature and stable products
  • Neo4J. Most mature graph database. Big advantage IMO is the awesome query language they use. Integrated Lucene engine. Very mature and well engineered product. Disadvantage is the fact that it is not a Document Graph but Property (key-value) Graph. Also it can be expensive
  • MongoDB. Our first experience with Document store. Very good product. Big advantage: excellent documentation, (by far) most popular NoSQL database
  • OrientDB and ArangoDB. Both support the Graph/Document paradigm. This are less known products, but very powerful. Because we are a Java based shop, our preference goes to OrientDB. OrientDB has a Lucene engine integrated (although the implementation is quite simple). ArangoDB on the other hand has very good documentation and a very smart and efficient storage format and finally the AQL is also very nice!
  • Performance: (tested with 11.43 mio Articles and 2.3 mio products). All products are very fast, especially SOLR and ES in this use case. Embedded OrientDB is also mind blowing fast for import and simple queries. For faceted search only the Search Servers provide real fast performance!
  • Bottom line: I would go for a Graph/Document store and/or Search Server (SOLR or ES). Because you mentioned "filtering" (I assume faceted search). The Search Server is the obvious first choice

这篇关于最好的NoSQL用于在多个索引/字段上进行过滤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆