同时写入和读取Cassandra时,Spark Streaming App卡住了 [英] Spark Streaming App stucks while writing and reading to/from Cassandra simultaneously

查看:78
本文介绍了同时写入和读取Cassandra时,Spark Streaming App卡住了的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在做一些基准测试,其中包括以下数据流:

I was doing some benchmarking that consists of the following data flow:

Kafka-> Spark Streaming-> Cassandra-> Prestodb

Kafka --> Spark Streaming --> Cassandra --> Prestodb

基础结构:我的Spark流应用程序在4个执行器上运行(每个2核4g内存).每个执行程序都在安装了Cassandra的数据节点上运行.4个PrestoDB工作程序也位于数据节点中.我的群集有5个节点,每个节点都有一个Intel Core i5、32GB DDR3 RAM,500GB SSD和1Gb网络.

Infrastructure: My spark streaming application runs on 4 executors (2 cores 4g of memory each). Each executor runs on a datanode wherein Cassandra is installed. 4 PrestoDB workers are also co-located in the datanodes. My cluster has 5 nodes, each of them with an Intel core i5, 32GB of DDR3 RAM, 500GB SSD and 1gigabit network.

火花流应用程序:我的Spark流批处理间隔为10秒,我的kafka生产者每3秒产生5000个事件.我的流式应用程序写入了2个Cassandra表.

Spark streaming application: My Spark streaming batch interval is 10s, my kafka producer produces 5000 events each 3 seconds. My streaming application writes to 2 Cassandra tables.

一切正常的上下文:一切正常,流媒体应用程序能够处理事件并将其存储在Cassandra中.批次间隔足够长,摄取率,调度和处理延迟在很长一段时间内几乎保持恒定.

Context in which everything works fine: Everything runs fine, the streaming application is able to process the events and store them in Cassandra. The batch interval is adequate, ingestion rates, scheduling and processing delay stays almost constant for long periods of time.

使事情变得混乱和混乱的上下文:在我的基准测试中,我每小时都会在Cassandra表上运行6个查询.在运行这些查询的时间内,Spark流应用程序不再能够维持写入吞吐量,并且在写入Cassandra时挂起.

Context where things get messy and confusing: In my benchmark, every hour I run 6 queries over the Cassandra tables. For the amount of time I am running these queries, the Spark streaming application is no longer able to sustain the write throughput and hangs when writing to Cassandra.

到目前为止我所做的事情:我在其他Web帖子(包括stackoverflow)中搜索了这种现象,但是找不到类似的现象.我见过的最好的方法是增加Cassandra可用的内存量.还看到了与连接器的获取大小有关的其他方面,但是我不知道这是否有问题,因为它仅在同时读写时才会发生.

What I've done so far: I searched for this phenomenon in other web posts (including stackoverflow), but I was not able to find a similar phenomenon. The best I've seen was to increase the amount of memory available to Cassandra. Other aspects related to the fetch size of the connectors were also seen, but I don't know if this is a problem, since it only occurs when reading and writing simultaneously.

问题:Cassandra不应在阅读时锁定写入,对吗?你们认为我需要解决的问题的根源是什么?我应该考虑哪些配置?

Question: Cassandra shouldn't lock writes while reading, right? What do you guys think is the source (or sources) of the problem that I need to solve? What configurations should I take into consideration?

我附上了一张印刷品一张印刷品说明了该工作停留在以下阶段:如前所述,当我使用6个查询运行基准测试时,将写入其中一个Cassandra表.如果您需要更多信息来跟踪问题,请随意询问.我很感激!

I attached a print a print illustrating the job being stuck in the stage that writes to one of the Cassandra tables, when I run the benchmark with the 6 queries, as previously explained. If you need more information to trace the problem, please fell free to ask. I appreciate!

非常感谢您的支持,

希望我以适当的方式提出问题,

Hope I placed the question in a proper manner,

最诚挚的问候,

卡洛斯

推荐答案

由于原因/假设,此问题似乎是在Cassandra-Presto方面而不是在Spark上

This problem looks to be on Cassandra-Presto side and not on Spark because of reasons/assumptions

  1. 由于火花执行器是由RM(纱线/中观等)处理的,因此您的查询不会直接影响到它.在关闭查询期间,如前所述,摄取会顺利进行.
  2. 仅当您直接与其他组件共享资源时,才会发生火花侧资源匮乏的情况.通常,Cassandra,Presto工作者/线程不是使用RM分配的,因此它们从节点角度而不是从RM角度共享资源.

我怀疑可能是造成失速的原因,

I suspect reasons for stalls could be,

  1. 在查询期间,Cassandra正在读取大量数据,因此JVM内存利用率增加,并且发生了大量GC.GC暂停可能是暂停/停滞的原因.
  2. 查询完全使用了与Cassandra的连接数(读/写),因此Spark作业无法插入数据并在队列中等待获取连接.
  3. 总体上,节点上的资源利用率增加了,并且可能有一个组件已达到其极限(CPU,内存,磁盘等).在这种情况下,IMO CPU,磁盘值得检查.

通过监视 heap util GC日志使用JMX打开连接来为Cassandra验证这些原因,然后根据需要提高这些值可用资源来解决问题,并尝试调整Presto查询,以使影响最小.

Validate these reasons either by monitoring heap util and GC logs, open connections using JMX for Cassandra and then bump up those values depending on available resources to resolve the issue and try to tune Presto queries as well so have minimal impact.

一旦确认Cassandra问题,就可以在后面进行Presto调整.可以在

Presto tuning can be taken as later part once you confirm Cassandra issue. More Presto tuning are available at

https://prestodb.io/docs/current/admin/tuning.html或者如果使用了Teradata解决方案, https://teradata.github.io/presto/docs/current/admin/tuning.html

https://prestodb.io/docs/current/admin/tuning.html or if teradata solution is used then, https://teradata.github.io/presto/docs/current/admin/tuning.html

这篇关于同时写入和读取Cassandra时,Spark Streaming App卡住了的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆