Apache Spark Scala异常处理 [英] Apache spark scala Exception handling

查看:143
本文介绍了Apache Spark Scala异常处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在Spark-Scala中处理无效记录的异常处理
这是我的代码:

How do I do Exception handling in Spark - Scala for invalid records Here is my code:

val rawData = sc.textFile(file)
val rowRDD = rawData.map(line => Row.fromSeq(line.split(",")))
val rowRDMapped = rowRDD.map { x => x.get(1), x.get(10) }
val DF = rowRDMapped.toDF("ID", "name" )

如果输入数据很好,那么一切都很好,如果我没有足够的字段,则会得到ArrayIndexOutOfBoundException。

Everything works fine if the input data is fine, If I dont have enough fields, I get ArrayIndexOutOfBoundException.

我正在尝试尝试捕获,但是我无法通过尝试捕获来跳过包含无效数据的记录

I am trying to put try-catch around, but I am not able to skip the records with invalid data, via try catch

val rowRDMapped = rowRDD.map { try {
                                    x => x.get(1), x.get(10) 
                                    }catch {
                                        println("Invalid Data")
                                        //Here it expects to return ROW, but I am not sure what to do here, since I dont want any data to be returned.
                                    }
                             }  

请让我知道如何尝试解决问题捕获,如果有更好的解决方案,那也会有所帮助

Please let me know how to solve the issue with try catch and if there is any better solution, that also would help lot

推荐答案

代替try-catch,您可以使用尝试

Instead of try-catch you can use Try

下面的代码将过滤掉没有足够字段的数据行,并获得带有剩余字段的数据帧。

Code below will filter out data lines which don't have enough fields and get dataframe with remaining ones.

val rawData = sc.textFile(line)
val rowRDD = rawData.map(line => Row.fromSeq(line.split(",")))
val rowRDMapped = rowRDD.flatMap{ x => Try(x.getString(1), x.getString(10)).toOption }
val DF = rowRDMapped.toDF("ID", "name")

这篇关于Apache Spark Scala异常处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆