ElasticSearch 与 ElasticSearch+Cassandra [英] ElasticSearch vs. ElasticSearch+Cassandra

查看:20
本文介绍了ElasticSearch 与 ElasticSearch+Cassandra的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的主要问题是集成 Cassandra 和 Elasticsearch 与仅使用 Elasticsearch 相比有什么好处?

事实上,StackOverflow 上也有类似问题的答案(例如,here此处).但有几点:

  • 很多答案都是旧的.这些年来可能发生了很大变化.
  • 提到的一点是有时 ElasticSearch 会丢失写入".但是,可以想象,那些所谓的损失可能是因为这些年来已经解决了一些错误.可以假设,例如,Cassandra 也可能存在一些导致数据丢失的错误.Cassandra 和 Elasticsearch 之间是否存在导致 Elasticsearch 丢失数据但不会导致 Cassandra 丢失数据的根本区别?
  • 有人提到架构更改很难在 ElasticSearch 中进行,而不会将所有内容全部删除并重新加载."假设我们的数据模型相对稳定或至少向后兼容,这对我们来说可能不是主要问题.此外,由于 Elasticsearch 中的动态映射,它可以适应新的需求(例如,额外的字段).
  • 关于 Elasticsearch 中的索引延迟,Cassandra 也不提供一致性.因此,在 Cassandra 中,您也可能会遇到读取写入数据的延迟.

总的来说,Cassandra 与 Elasticsearch 结合使用时会提供哪些额外功能?

附言如果问题得到一般性回答可能会更好.但是,如果有必要,假设我们只将行添加到数据库中,而从不删除或更新任何内容.我们希望能够在数据中进行全文搜索.

解决方案

作为链接答案之一的作者 (Elasticsearch vs Cassandra vs Elasticsearch with Cassandra),我想我应该在这里权衡一下.

<块引用>

那些所谓的损失可能是因为这些年来已经解决了一些错误.

这是一个绝对真实的声明.我写的答案已经有将近 6 年的历史了,那时 ElasticSearch 已经成长为一个更可靠的产品.话虽如此,Cassandra 可以做一些 ElasticSearch 无法做到的事情(反之亦然).

<块引用>

Cassandra 提供哪些额外功能...

我能想到一些,我将在此总结:

  • 写入吞吐量/性能/延迟

ElasticSearch 是一个基于 Lucene 项目的搜索引擎.以低延迟处理大量写入吞吐量并不是它的设计初衷;至少不是开箱即用".有一些方法可以将 ElasticSearch 配置得更好,如下所述:使用 ElasticSearch 实现高写入吞吐量的技术.但就以最少的配置构建新集群而言,您将花费更少的时间来设计 Cassandra 来完成此任务.

有时 ElasticSearch 会丢失写入"

是的,我写的.同样,ElasticSearch 有所改进.很多.但我仍然看到这种情况发生在高写入吞吐量条件下.当集群被设计为具有一定的吞吐量水平,并且应用程序超过这些容限导致节点因写入背压而不堪重负,写入丢失.

Cassandra 也不能幸免于这个问题.它只是对它有更高的容忍度.如果您同时使用它们,那么构建像 Kafka 这样的东西来限制"每个的写入吞吐量将是一个很好的方法.

  • 多数据中心高可用性 (MDHA)

凭借定义逻辑数据中心和可用区(机架)的能力,Cassandra 一直擅长在多个区域复制数据集.这对 ElasticSearch 来说是有问题的,因为它没有逻辑数据中心的概念,而且它的主"节点不是主动/主动的.

  • 对等节点与基于角色的节点

作为对我的 MDHA 观点的后续行动,ElasticSearch 现在允许在集群中为节点指定一个角色".您可以指定多个节点作为主"角色,负责添加和更新索引.任何节点都可以将搜索流量定向到在数据"角色下工作的节点.事实上,提高写入吞吐量的一种方法(我的第一个话题)是指定一两个节点具有摄取"角色,这样可以防止读写流量相互干扰.

这与 Cassandra 的方法背道而驰,其中每个节点都是对等节点,并且可以处理读取和写入.能够一视同仁地对待所有节点,简化了维护和管理.并且不",尽管普遍存在误解,种子"节点不是没什么特别的.

  • 查询与搜索

对我来说,这是两者之间的根本区别.查询与搜索不同.它们可能看起来很相似,但它们完全不同.

通过匹配一个或多个列/属性上的模式来检索数据是搜索.同样在搜索中,结果的数量更多是事先未知的.当然,Cassandra 在过去几年中添加了一些功能以允许基于 LIKE 查询的模式匹配(我不推荐使用它).但是当需要搜索"数据集的能力时,Cassandra 无法与 ElasticSearch 竞争.

通过在特定键(列)上提供特定值来检索数据是查询.通过查询,对要返回的结果数量有准确的预期也更容易.如果我正在构建一个应用程序,并且我知道我永远必须根据具有特定键的静态预定义查询检索数据,我每次都会选择 Cassandra.>

使用 Cassandra,我还可以调整查询一致性,需要来自更多或更少副本的操作确认.同样,我还可以根据应用程序的位置将这些操作定向到特定的地理区域.

<块引用>

...当与 Elasticsearch 结合使用时?

他们互相称赞.Cassandra 擅长一些 ElasicSearch 不擅长的事情(上面有详细说明)(反之亦然......说了很多).对应用程序的要求可能需要搜索查询.有时,您的应用需要高速键查找哦,我们也需要搜索."

总结,tl;dr;

所以虽然我在这里写了很多,但我将继续讨论的主要观点是为工作选择正确的工具.当我需要搜索时,我会选择 ElasticSearch.当我需要在高可用性、地理感知场景中查询时,我会选择 Cassandra.我仍然看到应用程序同时使用两者(串联),因此两者都有其优点.

My main question is what is the benefit of integrating Cassandra and Elasticsearch versus using only Elasticsearch?

In fact, there are answers to similar questions on StackOverflow (e.g., here and here). But there are some points:

  • A lot of answers are old. Much may have changed in these years.
  • One point that is mentioned is that "Sometimes ElasticSearch loses writes". However, it can be imagined those alleged loses may had been because of some bugs that have been solved in these years. It is assumable that e.g., Cassandra may also have some bugs that cause data loses. Is there any fundamental differences between Cassandra and Elasticsearch that cause Elasticsearch to lose data but doesn't cause it for Cassandra?
  • It is mentioned that "Schema changes are difficult to do in ElasticSearch without blowing everything away and reloading." This may not be a major problem for us, assuming that our data model is relatively stable or at-least backward-compatible. Also, because of dynamic mapping in Elasticsearch it may adapt itself with the new requirements (e.g., extra fields).
  • With respect to the indexing delay in Elasticsearch, Cassandra also does not provide consistency. So, in Cassandra you may also face delays in reading the written data.

Overall, what extra features does Cassandra offer when used in conjunction with Elasticsearch?

P.S. It may be better if the question is answered in general. But, if it is necessary, assume that we only append rows to the database and never delete or update anything. We want to be able to do full-text search in the data.

解决方案

So as the author of one of the linked answers (Elasticsearch vs Cassandra vs Elasticsearch with Cassandra), I suppose that I should weigh in here.

those alleged loses may had been because of some bugs that have been solved in these years.

This is an absolutely true statement. The answer I wrote is almost six years old, and ElasticSearch has grown to be a much more reliable product in that time. That being said, there are some things which Cassandra can do that ElasticSearch just wasn't designed to do (and vice-versa).

what extra features does Cassandra offer...

I can think of a few, which I'll summarize here:

  • Write throughput/performance/latency

ElasticSearch is a search engine based on the Lucene project. Handling large amounts of write throughput at low latencies is just not something that it was designed to do; at least not "out of the box." There are ways to configure ElasticSearch to be better at this, as described here: Techniques to Achieve High Write Throughput With ElasticSearch. But in terms of building a new cluster with minimal config, you'll spend less time engineering Cassandra to accomplish this.

"Sometimes ElasticSearch loses writes"

Yes, I wrote that. Again, ElasticSearch has improved. A lot. But I still see this happen under high write throughput conditions. When a cluster is engineered for a certain level of throughput, and an application exceeds those tolerances causing a node to become overwhelmed from the write back-pressure, writes will be lost.

Cassandra is not immune to this problem, either. It just has a higher tolerance for it. If you were to use them both together, architecting something like Kafka to "throttle" the write throughput to each would be a good approach.

  • Multi Data center High Availability (MDHA)

With the ability to define logical data centers and availability zones (racks), Cassandra has always been good at replicating a data set over multiple regions. This is problematic for ElasticSearch, as it does not have a concept of a logical data center, and its "master" nodes are not active/active.

  • Peer nodes vs. role-based nodes

As a follow-up to my MDHA point, ElasticSearch now allows for nodes to be designated with a "role" in the cluster. You can specify multiple nodes to act as the "master" role, in-charge of adding and updating indexes. Any node can direct search traffic to the nodes which work under the "data" role. In fact, one way to improve write throughput (my first talking point), is to designate a node or two with the "ingest" role, which can prevent read and write traffic from interfering with each other.

This deviates from Cassandra's approach where every node is a peer, and can handle reads and writes. Being able to treat all nodes the same, simplifies maintenance and administration. And "no," despite popular misconception, a "seed" node not is not anything special.

  • Query vs. Search

To me, this is the fundamental difference between the two. Querying is not the same as searching. They may seem similar, but they are quite different.

Retrieving data by matching a pattern on one or multiple columns/properties is searching. Also with searching, the number of results is more of an unknown beforehand. Sure, Cassandra has added some features in the last few years to allow for pattern matching based on LIKE queries (I don't recommend its use). But when the ability to "search" a data set is required, Cassandra can't compete with ElasticSearch.

Retrieving data by providing a specific value on a specific key (column) is querying. With querying, it is also easier to have accurate expectations on the number of results to be returned. If I was building an app and I knew that I'd only ever have to retrieve data based on a static, pre-defined query with a specific key, I'd choose Cassandra every time.

With Cassandra, I can also tune query consistency, requiring operational acknowledgement from more or fewer replicas. Likewise, I can also direct those operations to a specific geographic region, based on the locality of the application.

...when used in conjunction with Elasticsearch?

They compliment each other well. Cassandra is good at some things (detailed above) that ElasicSearch is not (and vice-versa...saying that a lot). Requirements for an application may require both searching and querying. Sometimes you've got an app that needs that high-speed key lookup "oh, and we also want search."

Summary, tl;dr;

So while I've written quite a bit here, the main point that I'll keep coming back to, is picking the right tool for the job. When I need to search I'll pick ElasticSearch. When I need to query in a highly-available, geographically-aware scenario, I'll pick Cassandra. I still see applications use both (in tandem), so both have their merits.

这篇关于ElasticSearch 与 ElasticSearch+Cassandra的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆