为什么Cassandra TableWriter写入0条记录以及如何解决? [英] Why Cassandra TableWriter writing 0 records and how to fix it?

查看:87
本文介绍了为什么Cassandra TableWriter写入0条记录以及如何解决?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将RDD写入Cassandra表中. 如下所示,TableWriter多次写入0行,最后写入Cassandra.

I am trying to write an RDD into a Cassandra table. As shown below TableWriter wrote 0 rows several times and finally writes to Cassandra.

18/10/22 07:15:50 INFO TableWriter: Wrote 0 rows to log_by_date in 0.171 s.
18/10/22 07:15:50 INFO Executor: Finished task 4.0 in stage 0.0 (TID 4). 622 bytes result sent to driver
18/10/22 07:15:50 INFO TableWriter: Wrote 0 rows to log_by_date in 0.220 s.
18/10/22 07:15:50 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 665 bytes result sent to driver
18/10/22 07:15:50 INFO TableWriter: Wrote 0 rows to log_by_date in 0.194 s.
18/10/22 07:15:50 INFO TableWriter: Wrote 0 rows to log_by_date in 0.224 s.
18/10/22 07:15:50 INFO Executor: Finished task 6.0 in stage 0.0 (TID 6). 708 bytes result sent to driver
18/10/22 07:15:50 INFO TableWriter: Wrote 0 rows to log_by_date in 0.231 s.
18/10/22 07:15:50 INFO Executor: Finished task 5.0 in stage 0.0 (TID 5). 622 bytes result sent to driver
18/10/22 07:15:50 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3). 622 bytes result sent to driver
18/10/22 07:15:50 INFO TableWriter: Wrote 0 rows to log_by_date in 0.246 s.
18/10/22 07:15:50 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 708 bytes result sent to driver
18/10/22 07:15:50 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 418 ms on localhost (executor driver) (1/8)
18/10/22 07:15:50 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 433 ms on localhost (executor driver) (2/8)
18/10/22 07:15:50 INFO TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 426 ms on localhost (executor driver) (3/8)
18/10/22 07:15:50 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 433 ms on localhost (executor driver) (4/8)
18/10/22 07:15:50 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 456 ms on localhost (executor driver) (5/8)
18/10/22 07:15:50 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 436 ms on localhost (executor driver) (6/8)
18/10/22 07:15:50 INFO TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 424 ms on localhost (executor driver) (7/8)
18/10/22 07:15:50 INFO **TableWriter: Wrote 1 rows to log_by_date in 0.342 s.**

为什么它无法事先保存几次,如何调整以进行生产?

Why it is failing to save it sevaral times prior, how to tune it for production?

推荐答案

这不是user10465355指出的故障.当Spark将工作分解为任务时,工作可能分布不均,或者没有足够的工作来完成每个任务.这导致某些任务为空,因此当Spark Cassandra Connector处理它们时,它们将写入0行.

This is not a failure as noted by user10465355. When Spark breaks a job into Tasks it is possible that the work is not evenly distributed or that there isn't enough work for every task to have work to do. This results in some tasks being empty, so when they are processed by the Spark Cassandra Connector they write 0 rows.

例如说;

  1. 您将100条记录读入10个Spark分区/任务
  2. 您执行了一个过滤器,该过滤器使用过滤器消除了值,因此现在5个任务中仅剩下30条记录.其他5个为空.
  3. 编写时,您现在将仅看到为5个任务编写的记录,而5个任务将报告它们没有写行.

这篇关于为什么Cassandra TableWriter写入0条记录以及如何解决?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆