在Spark Dataframe中将空值替换为空 [英] Replace Empty values with nulls in Spark Dataframe

查看:1908
本文介绍了在Spark Dataframe中将空值替换为空的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个n列数的数据框,我想用空值替换所有这些列中的空字符串.

I have a data frame with n number of columns and I want to replace empty strings in all these columns with nulls.

我尝试使用

val ReadDf = rawDF.na.replace("columnA", Map( "" -> null));

val ReadDf = rawDF.withColumn("columnA", if($"columnA"=="") lit(null) else $"columnA" );

他们两个都不起作用.

任何潜在客户都将受到高度赞赏.谢谢.

Any leads would be highly appreciated. Thanks.

推荐答案

由于错误导致replace不能用null替换值的错误,您的第一种方法接缝失败,请参见

Your first approach seams to fail due to a bug that prevents replace from being able to replace values with nulls, see here.

您的第二种方法失败了,因为您将驱动程序端Scala代码与执行程序端Dataframe指令相混淆:您的if-else表达式将在驱动程序上进行一次一次求值(而不是按记录);您想用对when函数的调用来代替它;此外,要比较列的值,您需要使用===运算符,而不是Scala的==,后者只是比较驱动程序端Column对象:

Your second approach fails because you're confusing driver-side Scala code for executor-side Dataframe instructions: your if-else expression would be evaluated once on the driver (and not per record); You'd want to replace it with a call to when function; Moreover, to compare a column's value you need to use the === operator, and not Scala's == which just compares the driver-side Column object:

import org.apache.spark.sql.functions._

rawDF.withColumn("columnA", when($"columnA" === "", lit(null)).otherwise($"columnA"))

这篇关于在Spark Dataframe中将空值替换为空的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆