每个循环嵌套两个DataFrame [英] Two DataFrame nested for Each Loop
问题描述
foreach
循环嵌套的DataFrams迭代引发NullPointerException:
The foreach
Loop nested iteration of DataFrams throws a NullPointerException:
def nestedDataFrame(leftDF: DataFrame, riteDF: DataFrame): Unit = {
val leftCols: Array[String] = leftDF.columns
val riteCols: Array[String] = riteDF.columns
leftCols.foreach { ltColName =>
leftDF.select(ltColName).foreach { ltRow =>
val leftString = ltRow.apply(0).toString();
// Works ... But Same Kind Of Code Below
riteCols.foreach { rtColName =>
riteDF.select(rtColName).foreach { rtRow => //Exception
val riteString = rtRow.apply(0).toString();
print(leftString.equals(riteString)
}
}
}
}
例外:
java.lang.NullPointerException 在org.apache.spark.sql.Dataset $ .ofRows(Dataset.scala:77) 在org.apache.spark.sql.Dataset.org $ apache $ spark $ sql $ Dataset $$ withPlan(Dataset.scala:3406) 在org.apache.spark.sql.Dataset.select(Dataset.scala:1334) 在org.apache.spark.sql.Dataset.select(Dataset.scala:1352)
java.lang.NullPointerException at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:77) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:3406) at org.apache.spark.sql.Dataset.select(Dataset.scala:1334) at org.apache.spark.sql.Dataset.select(Dataset.scala:1352)
可能出了什么问题以及如何解决?
What could be going wrong and how to fix it?
推荐答案
leftDF.select(ltColName).foreach { ltRow =>
以上行将您的代码带入foreach块中,作为执行程序的任务.现在使用riteDF.select(rtColName).foreach { rtRow =>
,您正在尝试访问执行器中的Spark会话,这是不允许的. Spark会话仅在驱动程序端可用.在ofRow
方法中,它尝试访问sparkSession
The above line brings your code inside the foreach block as a task to executor. Now with riteDF.select(rtColName).foreach { rtRow =>
, you are trying to access the Spark session within the executor which is not allowed. The Spark session is only available on the driver end. In the ofRow
method, it tries to access sparkSession
,
val qe = sparkSession.sessionState.executePlan(logicalPlan)
您不能像常规Java/Scala集合那样使用数据集集合,而应该通过可用的api来使用它们来完成任务,例如,可以加入它们以关联日期.
You can't use dataset collections just like regular Java/Scala collection, you should rather use them by the apis available to accomplish tasks, for example you can join them to correlate date.
在这种情况下,您可以通过多种方式完成比较.您可以加入2个数据集,例如
In this case, you can accomplish the comparison in a number of ways. You can join the 2 datasets, for example,
var joinedDf = leftDF.select(ltColName).join(riteDF.select(rtColName), $"ltColName" === $"rtColName", "inner")
然后分析joinedDf
.您甚至可以intersect()
这两个数据集.
Then analyze the joinedDf
. You can even intersect()
the two datasets.
这篇关于每个循环嵌套两个DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!