如何将空数组转换为空数组? [英] How to convert empty arrays to nulls?

查看:42
本文介绍了如何将空数组转换为空数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框,我需要将空数组转换为 null.

I have below dataframe and i need to convert empty arrays to null.

+----+---------+-----------+
|  id|count(AS)|count(asdr)|
+----+---------+-----------+
|1110| [12, 45]|   [50, 55]|     
|1111|       []|         []|    
|1112| [45, 46]|   [50, 50]|   
|1113|       []|         []|
+----+---------+-----------+

我试过下面的代码不起作用.

i have tried below code which is not working.

df.na.fill("null").show()

预期输出应该是

+----+---------+-----------+
|  id|count(AS)|count(asdr)|
+----+---------+-----------+
|1110| [12, 45]|   [50, 55]|     
|1111|     NUll|       NUll|    
|1112| [45, 46]|   [50, 50]|   
|1113|     NUll|       NUll|
+----+---------+-----------+

推荐答案

对于给定的dataframe,你可以简单地执行以下操作

For your given dataframe, you can simply do the following

from pyspark.sql import functions as F
df.withColumn("count(AS)", F.when((F.size(F.col("count(AS)")) == 0), F.lit(None)).otherwise(F.col("count(AS)"))) \
    .withColumn("count(asdr)", F.when((F.size(F.col("count(asdr)")) == 0), F.lit(None)).otherwise(F.col("count(asdr)"))).show()

您应该将 dataframe 输出为

+----+---------+-----------+
|  id|count(AS)|count(asdr)|
+----+---------+-----------+
|1110| [12, 45]|   [50, 55]|
|1111|     null|       null|
|1112| [45, 46]|   [50, 50]|
|1113|     null|       null|
+----+---------+-----------+

更新

如果你有两个以上的数组列并且你想动态应用上面的逻辑,你可以使用下面的逻辑

In case you have more than two array columns and you want to apply the above logic dynamically, you can use the following logic

from pyspark.sql import functions as F
for c in df.dtypes:
    if "array" in c[1]:
        df = df.withColumn(c[0], F.when((F.size(F.col(c[0])) == 0), F.lit(None)).otherwise(F.col(c[0])))
df.show()

在这里,
df.dtypes 会给你带有列名和数据类型的元组数组.至于问题中的数据框,它将是

Here,
df.dtypes would give you array of tuples with column name and datatype. As for the dataframe in the question it would be

[('id', 'bigint'), ('count(AS)', 'array<bigint>'), ('count(asdr)', 'array<bigint>')]

withColumn 仅应用于 array("array" in c[1]) 其中 F.size(F.col(c[0])) == 0 是用于检查数组大小的 when 函数的条件检查.如果条件为真,即空数组,则填充 None 否则填充原始值.循环应用于所有数组列.

withColumn is applied to only array columns ("array" in c[1]) where F.size(F.col(c[0])) == 0 is the condition checking for when function which checks for the size of the array. if the condition is true i.e. empty array then None is populated else original value is populated. The loop is applied to all the array columns.

这篇关于如何将空数组转换为空数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆