同时写入和读取Cassandra时，Spark Streaming App卡住了 [英] Spark Streaming App stucks while writing and reading to/from Cassandra simultaneously

查看：78 发布时间：2021/4/21 19:39:34 cassandra spark-streaming presto

本文介绍了同时写入和读取Cassandra时，Spark Streaming App卡住了的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在做一些基准测试，其中包括以下数据流:

I was doing some benchmarking that consists of the following data flow:

Kafka-> Spark Streaming-> Cassandra-> Prestodb

Kafka --> Spark Streaming --> Cassandra --> Prestodb

基础结构:我的Spark流应用程序在4个执行器上运行(每个2核4g内存).每个执行程序都在安装了Cassandra的数据节点上运行.4个PrestoDB工作程序也位于数据节点中.我的群集有5个节点，每个节点都有一个Intel Core i5、32GB DDR3 RAM，500GB SSD和1Gb网络.

Infrastructure: My spark streaming application runs on 4 executors (2 cores 4g of memory each). Each executor runs on a datanode wherein Cassandra is installed. 4 PrestoDB workers are also co-located in the datanodes. My cluster has 5 nodes, each of them with an Intel core i5, 32GB of DDR3 RAM, 500GB SSD and 1gigabit network.

火花流应用程序:我的Spark流批处理间隔为10秒，我的kafka生产者每3秒产生5000个事件.我的流式应用程序写入了2个Cassandra表.

Spark streaming application: My Spark streaming batch interval is 10s, my kafka producer produces 5000 events each 3 seconds. My streaming application writes to 2 Cassandra tables.

一切正常的上下文:一切正常，流媒体应用程序能够处理事件并将其存储在Cassandra中.批次间隔足够长，摄取率，调度和处理延迟在很长一段时间内几乎保持恒定.

Context in which everything works fine: Everything runs fine, the streaming application is able to process the events and store them in Cassandra. The batch interval is adequate, ingestion rates, scheduling and processing delay stays almost constant for long periods of time.

使事情变得混乱和混乱的上下文:在我的基准测试中，我每小时都会在Cassandra表上运行6个查询.在运行这些查询的时间内，Spark流应用程序不再能够维持写入吞吐量，并且在写入Cassandra时挂起.

Context where things get messy and confusing: In my benchmark, every hour I run 6 queries over the Cassandra tables. For the amount of time I am running these queries, the Spark streaming application is no longer able to sustain the write throughput and hangs when writing to Cassandra.

到目前为止我所做的事情:我在其他Web帖子(包括stackoverflow)中搜索了这种现象，但是找不到类似的现象.我见过的最好的方法是增加Cassandra可用的内存量.还看到了与连接器的获取大小有关的其他方面，但是我不知道这是否有问题，因为它仅在同时读写时才会发生.

What I've done so far: I searched for this phenomenon in other web posts (including stackoverflow), but I was not able to find a similar phenomenon. The best I've seen was to increase the amount of memory available to Cassandra. Other aspects related to the fetch size of the connectors were also seen, but I don't know if this is a problem, since it only occurs when reading and writing simultaneously.

问题:Cassandra不应在阅读时锁定写入，对吗?你们认为我需要解决的问题的根源是什么?我应该考虑哪些配置?

Question: Cassandra shouldn't lock writes while reading, right? What do you guys think is the source (or sources) of the problem that I need to solve? What configurations should I take into consideration?

我附上了一张印刷品一张印刷品说明了该工作停留在以下阶段:如前所述，当我使用6个查询运行基准测试时，将写入其中一个Cassandra表.如果您需要更多信息来跟踪问题，请随意询问.我很感激！

I attached a print a print illustrating the job being stuck in the stage that writes to one of the Cassandra tables, when I run the benchmark with the 6 queries, as previously explained. If you need more information to trace the problem, please fell free to ask. I appreciate!

非常感谢您的支持，

希望我以适当的方式提出问题，

Hope I placed the question in a proper manner,

最诚挚的问候，

卡洛斯

同时写入和读取Cassandra时，Spark Streaming App卡住了 [英] Spark Streaming App stucks while writing and reading to/from Cassandra simultaneously

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

同时写入和读取Cassandra时，Spark Streaming App卡住了 [英] Spark Streaming App stucks while writing and reading to/from Cassandra simultaneously

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭