Spark Scala Cassandra连接器删除所有行失败,出现IllegalArgumentException要求失败异常 [英] Spark Scala Cassandra connector delete all all rows is failing with IllegalArgumentException requirement failed Exception

查看:15
本文介绍了Spark Scala Cassandra连接器删除所有行失败,出现IllegalArgumentException要求失败异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

创建表格-

CREATE TABLE test.word_groups (group text, word text, count int,PRIMARY KEY (group,word));

插入数据-

INSERT INTO test.word_groups (group , word , count ) VALUES ( 'A-group', 'raj', 0) ;
INSERT INTO test.word_groups (group , word , count ) VALUES ( 'b-group', 'jaj', 0) ;
INSERT INTO test.word_groups (group , word , count ) VALUES ( 'A-group', 'raff', 3) ;

 SELECT * FROM word_groups ;

 group   | word | count
---------+------+-------
 b-group |  jaj |     0
 A-group | raff |     3
 A-group |  raj |     0

脚本-

val cassandraUrl = "org.apache.spark.sql.cassandra"
val wordGroup: Map[String, String] = Map("table" ->"word_groups", 
  "keyspace" -> "test", "cluster" -> "test-cluster")
val groupData = {spark.read.format(cassandraUrl).options(wordGroup).load()
  .where(col("group") === "b-group")}
groupData.rdd.deleteFromCassandra("sunbird_courses", "word_groups")

异常-

java.lang.IllegalArgumentException: requirement failed: Invalid row size: 3 instead of 2.
    at scala.Predef$.require(Predef.scala:224)
    at com.datastax.spark.connector.writer.SqlRowWriter.readColumnValues(SqlRowWriter.scala:23)
    at com.datastax.spark.connector.writer.SqlRowWriter.readColumnValues(SqlRowWriter.scala:12)
    at com.datastax.spark.connector.writer.BoundStatementBuilder.bind(BoundStatementBuilder.scala:102)
    at com.datastax.spark.connector.writer.GroupingBatchBuilder.next(GroupingBatchBuilder.scala:105)
    at com.datastax.spark.connector.writer.GroupingBatchBuilder.next(GroupingBatchBuilder.scala:30)
    at scala.collection.Iterator$class.foreach(Iterator.scala:891)
    at com.datastax.spark.connector.writer.GroupingBatchBuilder.foreach(GroupingBatchBuilder.scala:30)
    at com.datastax.spark.connector.writer.TableWriter$$anonfun$writeInternal$1.apply(TableWriter.scala:229)
    at com.datastax.spark.connector.writer.TableWriter$$anonfun$writeInternal$1.apply(TableWriter.scala:198)
    at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:112)
    at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:111)
    at com.datastax.spark.connector.cql.CassandraConnector.closeResourceAfterUse(CassandraConnector.scala:129)
    at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:111)
    at com.datastax.spark.connector.writer.TableWriter.writeInternal(TableWriter.scala:198)
    at com.datastax.spark.connector.writer.TableWriter.delete(TableWriter.scala:194)
    at com.datastax.spark.connector.RDDFunctions$$anonfun$deleteFromCassandra$1.apply(RDDFunctions.scala:119)
    at com.datastax.spark.connector.RDDFunctions$$anonfun$deleteFromCassandra$1.apply(RDDFunctions.scala:119)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:123)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
21/08/11 09:01:24 WARN TaskSetManager: Lost task 0.0 in stage 11.0 (TID 2953, localhost, executor driver): java.lang.IllegalArgumentException: requirement failed: Invalid row size: 3 instead of 2.

Spark版本-2.4.4和 Spark Cassandra连接器版本-2.5.0

Spark Cassandra连接器文档链接-https://github.com/datastax/spark-cassandra-connector/blob/master/doc/5_saving.md#deleting-rows-and-columns

我正在尝试删除这些列的所有记录,包括主键。

有什么解决办法吗?

我需要从Word_Groups表中删除GROUP&Q;A-GROUP&Q;的所有记录,包括主键/分区键

推荐答案

2.5.x中的一个有趣的变化,我没有注意到-即使指定了keyColumns,现在也需要有一个正确的行大小,以前没有它也可以工作-对我来说像是一个错误。

删除整行时只需保留主键-将DELETE更改为:

groupData.select("group", "word").rdd.deleteFromCassandra("test", "word_groups")

但在您的情况下,更好的做法是根据分区键列进行删除--在这种情况下,您将只有一个墓碑(您仍然只需要选择必要的列):

import com.datastax.spark.connector._
{groupData.select("group").rdd
  .deleteFromCassandra("test", "word_groups", keyColumns = SomeColumns("group"))}

您甚至不需要从Cassandra读取输入数据-如果您知道分区键的值,则只需创建RDD&;删除数据(类似于doc中所示):

case class Key (group:String)
{ sc.parallelize(Seq(Key("b-group")))
   .deleteFromCassandra("test", "word_groups", keyColumns = SomeColumns("group"))}

这篇关于Spark Scala Cassandra连接器删除所有行失败,出现IllegalArgumentException要求失败异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆