Apache Spark Scala异常处理 [英] Apache spark scala Exception handling
问题描述
如何在Spark-Scala中处理无效记录的异常处理
这是我的代码:
How do I do Exception handling in Spark - Scala for invalid records Here is my code:
val rawData = sc.textFile(file)
val rowRDD = rawData.map(line => Row.fromSeq(line.split(",")))
val rowRDMapped = rowRDD.map { x => x.get(1), x.get(10) }
val DF = rowRDMapped.toDF("ID", "name" )
如果输入数据很好,那么一切都很好,如果我没有足够的字段,则会得到ArrayIndexOutOfBoundException。
Everything works fine if the input data is fine, If I dont have enough fields, I get ArrayIndexOutOfBoundException.
我正在尝试尝试捕获,但是我无法通过尝试捕获来跳过包含无效数据的记录
I am trying to put try-catch around, but I am not able to skip the records with invalid data, via try catch
val rowRDMapped = rowRDD.map { try {
x => x.get(1), x.get(10)
}catch {
println("Invalid Data")
//Here it expects to return ROW, but I am not sure what to do here, since I dont want any data to be returned.
}
}
请让我知道如何尝试解决问题捕获,如果有更好的解决方案,那也会有所帮助
Please let me know how to solve the issue with try catch and if there is any better solution, that also would help lot
推荐答案
代替try-catch,您可以使用尝试
Instead of try-catch you can use Try
下面的代码将过滤掉没有足够字段的数据行,并获得带有剩余字段的数据帧。
Code below will filter out data lines which don't have enough fields and get dataframe with remaining ones.
val rawData = sc.textFile(line)
val rowRDD = rawData.map(line => Row.fromSeq(line.split(",")))
val rowRDMapped = rowRDD.flatMap{ x => Try(x.getString(1), x.getString(10)).toOption }
val DF = rowRDMapped.toDF("ID", "name")
这篇关于Apache Spark Scala异常处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!