Spark行级错误处理,如何在行级获取错误消息 [英] Spark row level error handling , how to get error message at row level

查看:151
本文介绍了Spark行级错误处理,如何在行级获取错误消息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个要通过spark加载的csv文件.我想区分好记录和坏记录,也想知道坏记录的每一行级错误.

I have a csv file which i am loading through spark . I want to separate the good and bad records and also want to know each row level error of the bad records .

我正在指定一个模式,并且可以像这样捕获损坏的记录,但是如何获取每个不同的损坏记录的错误消息?

I am specifying a schema and can capture corrupt_records like this , but how to get the error message for each different corrupt records?

  --------------+-----------+----------+--------------------+-------+--------------------+
|service_point_number|energy_type|is_enabled|            metadata|testint|     _corrupt_record|
+--------------------+-----------+----------+--------------------+-------+--------------------+
|            90453512|          E|     false|Address1@420#Addr...|     23|                null|
|            14802348|          G|     false|Address1@420#Addr...|     24|                null|
|                null|       null|      null|                null|   null|99944990,E,12,Add...|
|            78377144|          E|     false|                 123|     26|                null|
|            25506816|          G|     false|Address1@420#Addr...|     27|                null|
|            48789905|          E|      true|Address1@420#Addr...|   null|48789905,E,true,A...|
|            20283032|          E|     false|Address1@420#Addr...|     29|                null|
|            67311231|          G|     false|Address1@420#Addr...|     30|                null|
|            18240558|          G|     false|Address1@420#Addr...|     31|18240558,G,false,...|
|            42631153|          E|     false|Address1@420#Addr...|     32|                null|
+--------------------+-----------+----------+--------------------+-------+--------------------+

推荐答案

badRecordsPath can work but perhaps the reason you can't find it in the specified path is that the actual execution will not start until an action is triggered. Try using df.show() after your code spark.read... then look again if the output file is shown now in the path.

这篇关于Spark行级错误处理,如何在行级获取错误消息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆