PySpark-基于条件的Fillna特定行 [英] PySpark - Fillna specific rows based on condition

查看：178 发布时间：2021/4/8 20:06:35 python-3.x apache-spark pyspark azure-databricks

本文介绍了PySpark-基于条件的Fillna特定行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想替换数据框中的空值，但只替换符合特定条件的行.

I want to replace null values in a dataframe, but only on rows that match an specific criteria.

我有这个数据框:

A|B   |C   |D   |
1|null|null|null|
2|null|null|null|
2|null|null|null|
2|null|null|null|
5|null|null|null|

我要这样做:

A|B   |C   |D   |
1|null|null|null|
2|x   |x   |x   |
2|x   |x   |x   |
2|x   |x   |x   |
5|null|null|null|

我的案子

因此，列A中数字为2的所有行都应替换.

So all the rows that have the number 2 in the column A should get replaced.

A，B，C，D列是动态的，它们的数字和名称将更改.

The columns A, B, C, D are dynamic, they will change in numbers and names.

我还希望能够选择所有行，而不仅仅是被替换的行.

I also want to be able to select all the rows, not only the replaced ones.

我尝试过的事情

我尝试使用df.where和fillna，但是它不能保留所有行.

I tried with df.where and fillna, but it does not keep all the rows.

虽然我也要处理withColumn，但我只知道A列，其他所有列在每次执行时都会更改.

I also though about doing with withColumn, but I only know the column A, all the others will change on each execution.

适应的解决方案:

 df.select("A",
             *[
                 when(col("A") == '2', 
                    coalesce(col(c),
                    lit('0').cast(df.schema[c].dataType))
                 ).otherwise(col(c)).alias(c) 
                 for c in cols_to_replace
               ])

PySpark-基于条件的Fillna特定行 [英] PySpark - Fillna specific rows based on condition

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

PySpark-基于条件的Fillna特定行 [英] PySpark - Fillna specific rows based on condition

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭