如何将空数组转换为null? [英] How to convert empty arrays to nulls?

查看:125
本文介绍了如何将空数组转换为null?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有下面的数据框,我需要将空数组转换为null.

I have below dataframe and i need to convert empty arrays to null.

+----+---------+-----------+
|  id|count(AS)|count(asdr)|
+----+---------+-----------+
|1110| [12, 45]|   [50, 55]|     
|1111|       []|         []|    
|1112| [45, 46]|   [50, 50]|   
|1113|       []|         []|
+----+---------+-----------+

我尝试了以下不起作用的代码.

i have tried below code which is not working.

df.na.fill("null").show()

预期输出应为

+----+---------+-----------+
|  id|count(AS)|count(asdr)|
+----+---------+-----------+
|1110| [12, 45]|   [50, 55]|     
|1111|     NUll|       NUll|    
|1112| [45, 46]|   [50, 50]|   
|1113|     NUll|       NUll|
+----+---------+-----------+

推荐答案

对于给定的dataframe,您只需执行以下操作

For your given dataframe, you can simply do the following

from pyspark.sql import functions as F
df.withColumn("count(AS)", F.when((F.size(F.col("count(AS)")) == 0), F.lit(None)).otherwise(F.col("count(AS)"))) \
    .withColumn("count(asdr)", F.when((F.size(F.col("count(asdr)")) == 0), F.lit(None)).otherwise(F.col("count(asdr)"))).show()

您应该将dataframe输出为

+----+---------+-----------+
|  id|count(AS)|count(asdr)|
+----+---------+-----------+
|1110| [12, 45]|   [50, 55]|
|1111|     null|       null|
|1112| [45, 46]|   [50, 50]|
|1113|     null|       null|
+----+---------+-----------+

已更新

如果您有两个以上的数组列,并且要动态应用上述逻辑,则可以使用以下逻辑

In case you have more than two array columns and you want to apply the above logic dynamically, you can use the following logic

from pyspark.sql import functions as F
for c in df.dtypes:
    if "array" in c[1]:
        df = df.withColumn(c[0], F.when((F.size(F.col(c[0])) == 0), F.lit(None)).otherwise(F.col(c[0])))
df.show()

在这里,
df.dtypes将为您提供具有列名和数据类型的元组数组.至于问题中的数据框是

Here,
df.dtypes would give you array of tuples with column name and datatype. As for the dataframe in the question it would be

[('id', 'bigint'), ('count(AS)', 'array<bigint>'), ('count(asdr)', 'array<bigint>')]

withColumn仅应用于 array ("array" in c[1]),其中F.size(F.col(c[0])) == 0是对when函数的条件检查,该功能检查数组的大小.如果条件为true,即为空数组,则填充None,否则填充原始值.循环将应用于所有数组列.

withColumn is applied to only array columns ("array" in c[1]) where F.size(F.col(c[0])) == 0 is the condition checking for when function which checks for the size of the array. if the condition is true i.e. empty array then None is populated else original value is populated. The loop is applied to all the array columns.

这篇关于如何将空数组转换为null?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆