PySpark 中的比较运算符(不等于/！=) [英] Comparison operator in PySpark (not equal/ !=)

查看：37 发布时间：2021/11/14 21:54:28 sql apache-spark pyspark null apache-spark-sql

本文介绍了PySpark 中的比较运算符(不等于/！=)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图获取数据帧中的所有行，其中两个标志设置为1"，随后所有那些只有两个标志中的一个设置为1"而另一个不相等到1"

I am trying to obtain all rows in a dataframe where two flags are set to '1' and subsequently all those that where only one of two is set to '1' and the other NOT EQUAL to '1'

使用以下架构(三列)，

With the following schema (three columns),

df = sqlContext.createDataFrame([('a',1,'null'),('b',1,1),('c',1,'null'),('d','null',1),('e',1,1)], #,('f',1,'NaN'),('g','bla',1)],
                            schema=('id', 'foo', 'bar')
                            )

我获得以下数据框:

+---+----+----+
| id| foo| bar|
+---+----+----+
|  a|   1|null|
|  b|   1|   1|
|  c|   1|null|
|  d|null|   1|
|  e|   1|   1|
+---+----+----+

当我应用所需的过滤器时，第一个过滤器(foo=1 AND bar=1)有效，但另一个无效(foo=1 AND NOT bar=1)

When I apply the desired filters, the first filter (foo=1 AND bar=1) works, but not the other (foo=1 AND NOT bar=1)

foobar_df = df.filter( (df.foo==1) & (df.bar==1) )

产量:

+---+---+---+
| id|foo|bar|
+---+---+---+
|  b|  1|  1|
|  e|  1|  1|
+---+---+---+

这是非行为过滤器:

foo_df = df.filter( (df.foo==1) & (df.bar!=1) )
foo_df.show()
+---+---+---+
| id|foo|bar|
+---+---+---+
+---+---+---+

为什么不过滤?如何获得只有 foo 等于 '1' 的列?

Why is it not filtering? How can I get the columns where only foo is equal to '1'?

PySpark 中的比较运算符(不等于/！=) [英] Comparison operator in PySpark (not equal/ !=)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

PySpark 中的比较运算符(不等于/！=) [英] Comparison operator in PySpark (not equal/ !=)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭