突然抛出这个 RDD 缺少一个 SparkContext 它在每个代码都在 main 方法之前工作 [英] suddenly throwing This RDD lacks a SparkContext it was working before every code was in main method

查看：23 发布时间：2021/11/14 23:26:36 scala apache-spark apache-spark-sql

本文介绍了突然抛出这个 RDD 缺少一个 SparkContext 它在每个代码都在 main 方法之前工作的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是一段有效的代码，但在我尝试从不同的 scala 对象

It was a working piece of code but suddenly its not working after I tried creating Sparksession from different scala object

val b = a.filter { x => (!x._2._1.isEmpty) && (!x._2._2.isEmpty) } 

val primary_ke = b.map(rec => (rec._1.split(",")(0))).distinct 

for (i <- primary_key_distinct) {    
  b.foreach(println)

}

错误:

 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 5)
org.apache.spark.SparkException: This RDD lacks a SparkContext. It could happen in the following cases: 
(1) RDD transformations and actions are NOT invoked by the driver, but inside of other transformations; for example, rdd1.map(x => rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation. For more information, see SPARK-5063.
(2) When a Spark Streaming job recovers from checkpoint, this exception will be hit if a reference to an RDD not defined by the streaming job is used in DStream operations. For more information, See SPARK-13758.

即使在我撤销它之后也无法工作，而且我没有使用任何对象.

Not working even after I revoked it and I'm not using any objects.

代码更新:

object try {

  def main(args: Array[String]) {



val spark = SparkSession.builder().master("local").appName("50columns3nodes").getOrCreate()

var s = spark.read.csv("/home/hadoopuser/Desktop/input/source.csv").rdd.map(_.mkString(","))
var k = spark.read.csv("/home/hadoopuser/Desktop/input/destination.csv").rdd.map(_.mkString(","))

val source_primary_key = s.map(rec => (rec.split(",")(0), rec))
val destination_primary_key = k.map(rec => (rec.split(",")(0), rec))

val a = source_primary_key.cogroup(destination_primary_key).filter { x => ((x._2._1) != (x._2._2)) }
val b = a.filter { x => (!x._2._1.isEmpty) && (!x._2._2.isEmpty) } 

var extra_In_Dest = a.filter(x => x._2._1.isEmpty && !x._2._2.isEmpty).map(rec => (rec._2._2.mkString(""))) 
var extra_In_Src = a.filter(x => !x._2._1.isEmpty && x._2._2.isEmpty).map(rec => (rec._2._1.mkString(""))) 

val primary_key_distinct = b.map(rec => (rec._1.split(",")(0))).distinct 
for (i <- primary_key_distinct) {

  var lengthofarray = 0
  println(i)
  b.foreach(println)

}
}
}

输入数据如下

s=1,david2、杰3、地久4、阿比5、苏兰哈

k=1,david2、杰3,jijoaa4、abisdsdd5、苏兰哈

val a 包含 {3,(jijo,jijoaa),5(abi,abisdsdd)}

val a contains {3,(jijo,jijoaa),5(abi,abisdsdd)}

推荐答案

如果你仔细阅读第一条信息

If you read carefully the first message

(1) RDD 的转换和动作不是由驱动调用，而是在其他转换内部；例如，rdd1.map(x => rdd2.values.count() * x) 无效，因为值转换和计数操作不能在 rdd1.map 转换内部执行.有关详细信息，请参阅 SPARK-5063.

(1) RDD transformations and actions are NOT invoked by the driver, but inside of other transformations; for example, rdd1.map(x => rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation. For more information, see SPARK-5063.

它明确指出动作和转换不能在转换中执行.

It clearly states that actions and transformations cannot be performed inside a transformation.

primary_key_distinct 是对 b 进行的转换，而 b 本身就是一个转换在 a 上完成.而 b.foreach(println) 是在 primary_key_distinct

primary_key_distinct is transformation done on b and b itself is a transformation done on a. And b.foreach(println) is an action done inside transformation of primary_key_distinct

因此，如果您在驱动程序中收集了b 或primary_key_distinct，那么代码应该可以正常运行

So if you collect b or primary_key_distinct inside driver, then the code should run properly

val b = a.filter { x => (!x._2._1.isEmpty) && (!x._2._2.isEmpty) }.collect

或

val primary_key_distinct = b.map(rec => (rec._1.split(",")(0))).distinct.collect

或如果您不在另一个转换中使用action，那么代码也应该像

or if you don't use action inside another transformation then the code should run properly too as

for (i <- 1 to 2) {

  var lengthofarray = 0
  println(i)
  b.foreach(println)

}

我希望解释清楚.

这篇关于突然抛出这个 RDD 缺少一个 SparkContext 它在每个代码都在 main 方法之前工作的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

突然抛出这个 RDD 缺少一个 SparkContext 它在每个代码都在 main 方法之前工作 [英] suddenly throwing This RDD lacks a SparkContext it was working before every code was in main method

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

突然抛出这个 RDD 缺少一个 SparkContext 它在每个代码都在 main 方法之前工作 [英] suddenly throwing This RDD lacks a SparkContext it was working before every code was in main method

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭