如何将空数组转换为null? [英] How to convert empty arrays to nulls?
问题描述
我有下面的数据框,我需要将空数组转换为null.
I have below dataframe and i need to convert empty arrays to null.
+----+---------+-----------+
| id|count(AS)|count(asdr)|
+----+---------+-----------+
|1110| [12, 45]| [50, 55]|
|1111| []| []|
|1112| [45, 46]| [50, 50]|
|1113| []| []|
+----+---------+-----------+
我尝试了以下不起作用的代码.
i have tried below code which is not working.
df.na.fill("null").show()
预期输出应为
+----+---------+-----------+
| id|count(AS)|count(asdr)|
+----+---------+-----------+
|1110| [12, 45]| [50, 55]|
|1111| NUll| NUll|
|1112| [45, 46]| [50, 50]|
|1113| NUll| NUll|
+----+---------+-----------+
推荐答案
对于给定的dataframe
,您只需执行以下操作
For your given dataframe
, you can simply do the following
from pyspark.sql import functions as F
df.withColumn("count(AS)", F.when((F.size(F.col("count(AS)")) == 0), F.lit(None)).otherwise(F.col("count(AS)"))) \
.withColumn("count(asdr)", F.when((F.size(F.col("count(asdr)")) == 0), F.lit(None)).otherwise(F.col("count(asdr)"))).show()
您应该将dataframe
输出为
+----+---------+-----------+
| id|count(AS)|count(asdr)|
+----+---------+-----------+
|1110| [12, 45]| [50, 55]|
|1111| null| null|
|1112| [45, 46]| [50, 50]|
|1113| null| null|
+----+---------+-----------+
已更新
如果您有两个以上的数组列,并且要动态应用上述逻辑,则可以使用以下逻辑
In case you have more than two array columns and you want to apply the above logic dynamically, you can use the following logic
from pyspark.sql import functions as F
for c in df.dtypes:
if "array" in c[1]:
df = df.withColumn(c[0], F.when((F.size(F.col(c[0])) == 0), F.lit(None)).otherwise(F.col(c[0])))
df.show()
在这里,
df.dtypes
将为您提供具有列名和数据类型的元组数组.至于问题中的数据框是
Here,
df.dtypes
would give you array of tuples with column name and datatype. As for the dataframe in the question it would be
[('id', 'bigint'), ('count(AS)', 'array<bigint>'), ('count(asdr)', 'array<bigint>')]
withColumn
仅应用于 array 列("array" in c[1])
,其中F.size(F.col(c[0])) == 0
是对when
函数的条件检查,该功能检查数组的大小.如果条件为true,即为空数组,则填充None,否则填充原始值.循环将应用于所有数组列.
withColumn
is applied to only array columns ("array" in c[1])
where F.size(F.col(c[0])) == 0
is the condition checking for when
function which checks for the size of the array. if the condition is true i.e. empty array then None is populated else original value is populated. The loop is applied to all the array columns.
这篇关于如何将空数组转换为null?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!