根据pyspark中的条件从数据框中删除行 [英] Remove rows from dataframe based on condition in pyspark

查看：995 发布时间：2020/9/4 8:05:08 apache-spark dataframe pyspark

本文介绍了根据pyspark中的条件从数据框中删除行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

我有一个包含两列的数据框:

I have one dataframe with two columns:

+--------+-----+
|    col1| col2|
+--------+-----+
|22      | 12.2|
|1       |  2.1|
|5       | 52.1|
|2       | 62.9|
|77      | 33.3|

我想创建一个新的数据框，该数据框将仅包含

I would like to create a new dataframe which will take only rows where

"col1的值">"col2的值"

"value of col1" > "value of col2"

请注意， col1类型为长，而 col2类型为double ，

结果应该是这样的:

+--------+----+
|    col1|col2|
+--------+----+
|22      |12.2|
|77      |33.3|

另一种可能的方法是使用DF的where函数.

Another possible way could be using a where function of DF.

例如:

val output = df.where("col1>col2")

将为您带来预期的结果:

will give you the expected result:

+----+----+
|col1|col2|
+----+----+
|  22|12.2|
|  77|33.3|
+----+----+

这篇关于根据pyspark中的条件从数据框中删除行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文