在UDF函数Spark Scala中使用方法 [英] Use a method inside a UDF function Spark Scala

查看:550
本文介绍了在UDF函数Spark Scala中使用方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用一个位于用户设计函数内另一个类中的方法,但是它不起作用.

I want to use a method located in another class inside a user-designed function but it's not working.

我有一个方法:

 def traitementDataFrameEleve(sc:SparkSession, dfRedis:DataFrame, domainMail:String, dir:String):Boolean ={
     def loginUDF = udf((sn: String, givenName:String) => {
            LoginClass.GenerateloginPersone(sn,givenName,dfr)
          })

    dfEleve.withColumn("ENTPersonLogin",loginUDF(dfEleve("sn"),dfEleve("givenName")))
}

LoginClass是一个包含GenerateloginPersone方法的类.

LoginClass is a class that contains the GenerateloginPersone method.

输出错误:

org.apache.spark.SparkException: Failed to execute user defined function(anonfun$loginUDF$1$1: (string, string) => string)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:99)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.NullPointerException
    at org.apache.spark.sql.Dataset.schema(Dataset.scala:410)
    at org.apache.spark.sql.Dataset.printSchema(Dataset.scala:419)
    at IntegrationDonneesENTLea_V1_AcBordeaux.LoginClass$.GenerateloginPersone(LoginClass.scala:16)
    at IntegrationDonneesENTLea_V1_AcBordeaux.Eleve$$anonfun$loginUDF$1$1.apply(Eleve.scala:25)
    at IntegrationDonneesENTLea_V1_AcBordeaux.Eleve$$anonfun$loginUDF$1$1.apply(Eleve.scala:23)
    ... 16 more

谢谢.

推荐答案

不允许访问:

  • 分布式数据结构(例如DatasetRDD).
  • SparkConext/SparkSession
  • distributed data structures (like Dataset or RDD).
  • SparkConext / SparkSession

来自Spark任务(转换,udf应用程序).这就是为什么您获得NPE的原因.

from Spark task (transformation, udf application). This is why you get a NPE.

这篇关于在UDF函数Spark Scala中使用方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆