ElasticSearch与ElasticSearch + Cassandra [英] ElasticSearch vs. ElasticSearch+Cassandra

查看:99
本文介绍了ElasticSearch与ElasticSearch + Cassandra的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的主要问题是,与仅使用Elasticsearch相比,将Cassandra和Elasticsearch集成起来有什么好处?

My main question is what is the benefit of integrating Cassandra and Elasticsearch versus using only Elasticsearch?

事实上,StackOverflow也有类似问题的答案(例如, 此处此处)。但是有一些要点:

In fact, there are answers to similar questions on StackOverflow (e.g., here and here). But there are some points:


  • 很多答案都是古老的。这些年来可能发生了很多变化。

  • 提到的一点是有时ElasticSearch丢失写入。但是,可以想象那些所谓的损失可能是由于这些年来已解决的一些错误。可以推测,例如Cassandra也可能存在一些会导致数据丢失的错误。 Cassandra和Elasticsearch之间是否存在导致Elasticsearch丢失数据但不会导致Cassandra丢失的根本区别?

  • 有人提到架构更改很难在ElasticSearch中完成而不会吹牛一切都消失了,然后重新加载。假设我们的数据模型相对稳定或至少向后兼容,这对我们来说可能不是主要问题。另外,由于Elasticsearch中的动态映射,它可能会适应新的要求(例如,额外的字段)。

  • 关于Elasticsearch中的索引延迟,Cassandra也无法提供一致性。因此,在Cassandra中,您可能还会面临读取书面数据的延迟。

  • A lot of answers are old. Much may have changed in these years.
  • One point that is mentioned is that "Sometimes ElasticSearch loses writes". However, it can be imagined those alleged loses may had been because of some bugs that have been solved in these years. It is assumable that e.g., Cassandra may also have some bugs that cause data loses. Is there any fundamental differences between Cassandra and Elasticsearch that cause Elasticsearch to lose data but doesn't cause it for Cassandra?
  • It is mentioned that "Schema changes are difficult to do in ElasticSearch without blowing everything away and reloading." This may not be a major problem for us, assuming that our data model is relatively stable or at-least backward-compatible. Also, because of dynamic mapping in Elasticsearch it may adapt itself with the new requirements (e.g., extra fields).
  • With respect to the indexing delay in Elasticsearch, Cassandra also does not provide consistency. So, in Cassandra you may also face delays in reading the written data.

总体而言,Cassandra与其他功能结合使用时会提供哪些额外功能使用Elasticsearch?

Overall, what extra features does Cassandra offer when used in conjunction with Elasticsearch?

PS如果总体上回答该问题可能会更好。但是,如果有必要,请假设我们仅将行附加到数据库中,而从未删除或更新任何内容。我们希望能够在数据中进行全文搜索。

P.S. It may be better if the question is answered in general. But, if it is necessary, assume that we only append rows to the database and never delete or update anything. We want to be able to do full-text search in the data.

推荐答案

因此,作为链接答案之一的作者( Elasticsearch vs Cassandra vs Elasticsearch with Cassandra ),我想我

So as the author of one of the linked answers (Elasticsearch vs Cassandra vs Elasticsearch with Cassandra), I suppose that I should weigh in here.


这些所谓的损失可能是由于这些年来已解决的一些错误。

those alleged loses may had been because of some bugs that have been solved in these years.

这是一个绝对真实声明。我写的答案已经有将近六年的历史了,ElasticSearch在那个时候已经成长为很多更加可靠的产品。话虽这么说,Cassandra可以做一些ElasticSearch并非旨在做的事情(反之亦然)。

This is an absolutely true statement. The answer I wrote is almost six years old, and ElasticSearch has grown to be a much more reliable product in that time. That being said, there are some things which Cassandra can do that ElasticSearch just wasn't designed to do (and vice-versa).


Cassandra提供了哪些额外的功能...

what extra features does Cassandra offer...

我可以想到一些,在这里我将对其进行总结:

I can think of a few, which I'll summarize here:


  • 写入吞吐量/性能/延迟

  • Write throughput/performance/latency

ElasticSearch是基于Lucene项目的搜索引擎。以低延迟处理大量的写吞吐量并不是设计的目的。至少不是开箱即用。有一些方法可以配置ElasticSearch来达到更好的效果,如此处所述:使用ElasticSearch实现高写吞吐量的技术。但是就构建具有最低配置的新集群而言,您将花费较少的时间来设计Cassandra来完成此任务。

ElasticSearch is a search engine based on the Lucene project. Handling large amounts of write throughput at low latencies is just not something that it was designed to do; at least not "out of the box." There are ways to configure ElasticSearch to be better at this, as described here: Techniques to Achieve High Write Throughput With ElasticSearch. But in terms of building a new cluster with minimal config, you'll spend less time engineering Cassandra to accomplish this.

有时ElasticSearch丢失写入

"Sometimes ElasticSearch loses writes"

是的,我写道。同样,ElasticSearch得到了改进。很多。但是我仍然看到这种情况在高写入吞吐量条件下发生。当为某个特定级别的吞吐量设计群集时,应用程序超出的容差导致节点因写反压而变得不堪重负,则写 将会丢失。

Yes, I wrote that. Again, ElasticSearch has improved. A lot. But I still see this happen under high write throughput conditions. When a cluster is engineered for a certain level of throughput, and an application exceeds those tolerances causing a node to become overwhelmed from the write back-pressure, writes will be lost.

Cassandra也不能幸免于此。它只是具有更高的容忍度。如果您将两者一起使用,那么设计类似Kafka的东西来限制每个文件的写吞吐量将是一个好方法。

Cassandra is not immune to this problem, either. It just has a higher tolerance for it. If you were to use them both together, architecting something like Kafka to "throttle" the write throughput to each would be a good approach.


  • 多数据中心高可用性(MDHA)

  • Multi Data center High Availability (MDHA)

具有定义逻辑数据中心和可用性区域的能力(机架),Cassandra一直擅长在多个区域上复制数据集。
这对ElasticSearch来说是个问题,因为它没有逻辑数据中心的概念,并且其主节点处于非活动状态。

With the ability to define logical data centers and availability zones (racks), Cassandra has always been good at replicating a data set over multiple regions. This is problematic for ElasticSearch, as it does not have a concept of a logical data center, and its "master" nodes are not active/active.


  • 对等节点与基于角色的节点

  • Peer nodes vs. role-based nodes

到我的MDHA点,ElasticSearch现在允许在集群中用角色指定节点。您可以指定多个节点充当主角色,负责添加和更新索引。任何节点都可以将搜索流量定向到在数据角色下工作的节点。实际上,提高写入吞吐量的一种方法(我的第一个要点)是指定一个或两个具有最重要角色的节点,这可以防止读写流量相互干扰。

As a follow-up to my MDHA point, ElasticSearch now allows for nodes to be designated with a "role" in the cluster. You can specify multiple nodes to act as the "master" role, in-charge of adding and updating indexes. Any node can direct search traffic to the nodes which work under the "data" role. In fact, one way to improve write throughput (my first talking point), is to designate a node or two with the "ingest" role, which can prevent read and write traffic from interfering with each other.

这与Cassandra的方法不同,后者的每个节点都是对等的,并且可以处理读写。能够相同地对待所有节点,从而简化了维护和管理。而且,尽管人们普遍误解为不,但种子节点并不是什么特别的事情。

This deviates from Cassandra's approach where every node is a peer, and can handle reads and writes. Being able to treat all nodes the same, simplifies maintenance and administration. And "no," despite popular misconception, a "seed" node not is not anything special.


  • 查询与搜索

  • Query vs. Search

对我来说,这是两者之间的根本区别。查询与搜索相同。它们看起来似乎很相似,但是却完全不同。

To me, this is the fundamental difference between the two. Querying is not the same as searching. They may seem similar, but they are quite different.

通过匹配一个或多个列/属性上的模式来检索数据是搜索。同样,对于搜索,结果的数量更是事先未知。当然,Cassandra在最近几年中添加了一些功能,以允许基于 LIKE 查询进行模式匹配(我不建议您使用它)。但是,当需要搜索数据集的能力时,Cassandra就无法与ElasticSearch竞争。

Retrieving data by matching a pattern on one or multiple columns/properties is searching. Also with searching, the number of results is more of an unknown beforehand. Sure, Cassandra has added some features in the last few years to allow for pattern matching based on LIKE queries (I don't recommend its use). But when the ability to "search" a data set is required, Cassandra can't compete with ElasticSearch.

通过在特定键上提供特定值来检索数据(列)是查询。通过查询,对返回的结果数有准确的期望也更加容易。如果我正在构建应用程序,并且知道只要永远都必须基于具有特定键的静态预定义查询来检索数据,那么我每次都会选择Cassandra。

Retrieving data by providing a specific value on a specific key (column) is querying. With querying, it is also easier to have accurate expectations on the number of results to be returned. If I was building an app and I knew that I'd only ever have to retrieve data based on a static, pre-defined query with a specific key, I'd choose Cassandra every time.

借助Cassandra,我还可以调整查询一致性,要求从更多或更少的副本中进行操作确认。同样,我也可以根据应用程序的位置将这些操作定向到特定的地理区域。

With Cassandra, I can also tune query consistency, requiring operational acknowledgement from more or fewer replicas. Likewise, I can also direct those operations to a specific geographic region, based on the locality of the application.


...

...when used in conjunction with Elasticsearch?

他们互相称赞。 Cassandra擅长ElasicSearch所不具备的某些功能(详见上文)(反之亦然……说了很多)。应用程序要求可能同时需要 搜索。有时您有一个需要高速键查找的应用程序哦,我们也要搜索。

They compliment each other well. Cassandra is good at some things (detailed above) that ElasicSearch is not (and vice-versa...saying that a lot). Requirements for an application may require both searching and querying. Sometimes you've got an app that needs that high-speed key lookup "oh, and we also want search."

摘要,tl; dr;

因此,尽管我在这里已经写了很多文章,但我将继续谈到的重点是为这项工作选择合适的工具。当我需要 搜索 时,我将选择ElasticSearch。在高度可用,具有地理位置感知的情况下,当我需要 查询 时,我将选择Cassandra。我仍然看到应用程序同时使用这两种方法,因此它们各有优点。

So while I've written quite a bit here, the main point that I'll keep coming back to, is picking the right tool for the job. When I need to search I'll pick ElasticSearch. When I need to query in a highly-available, geographically-aware scenario, I'll pick Cassandra. I still see applications use both (in tandem), so both have their merits.

这篇关于ElasticSearch与ElasticSearch + Cassandra的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆