如何删除DataFrame中特定列的NULL和空? [英] How to remove NULL and empty for a particular column in DataFrame?

查看:26
本文介绍了如何删除DataFrame中特定列的NULL和空?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从 demo_name 为 NULL 且 demo_name 为空的数据框中删除记录.

I would like to remove records from a dataframe having demo_name as NULL and demo_name as empty.

demo_name 是该数据帧中字符串数据类型的列

demo_name is a column in that dataFrame with String datatype

我正在尝试以下代码.我想应用修剪,因为 demo_name 有多个空格的记录.

I am trying the below code . I want to apply trim as there are records for demo_name with multiple spaces.

   val filterDF = demoDF.filter($"demo_name".isNotNull && $"demo_name".trim != "" )

但我收到错误,因为无法解析符号修剪

But I get error as cannot resolve symbol trim

有人可以帮我解决这个问题吗?

Could someone help me to fix this issue ?

推荐答案

你正在调用 trim 就像你在操作一个 String,但是 $ 函数使用 implicit 转换将列的名称转换为 Column 实例本身.问题是 Column 没有 trim 函数.

You are calling trim as if you are acting on a String, but the $ function uses implicit conversion to convert the name of the column to the Column instance itself. The problem is that Column doesn't have a trim function.

您需要导入库函数并将它们应用到您的列中:

You need to import the library functions and apply them to your column:

import org.apache.spark.sql.functions._

demoDF.filter($"demo_name".isNotNull && length(trim($"demo_name")) > 0)

这里我使用库函数 trimlength--trim 来去除空格,然后是 length 以验证结果中是否包含任何内容.

Here I use the library functions trim and length--trim to strip the spaces of course and then length to verify that the result has anything in it.

这篇关于如何删除DataFrame中特定列的NULL和空?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆