在pyspark中查找并删除匹配的列值 [英] Find and remove matching column values in pyspark

查看：287 发布时间：2020/9/4 19:47:40 apache-spark pyspark spark-dataframe pyspark-sql

本文介绍了在pyspark中查找并删除匹配的列值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个pyspark数据框，其中的列偶尔会具有与另一列匹配的错误值.看起来像这样:

I have a pyspark dataframe where occasionally the columns will have a wrong value that matches another column. It would look something like this:

| Date         | Latitude      |
| 2017-01-01   | 43.4553       |
| 2017-01-02   | 42.9399       |
| 2017-01-03   | 43.0091       |
| 2017-01-04   | 2017-01-04    |

很显然，最后一个纬度"值不正确.我需要删除所有这样的行.我曾考虑过使用.isin()，但似乎无法正常工作.如果我尝试

Obviously, the last Latitude value is incorrect. I need to remove any and all rows that are like this. I thought about using .isin() but I can't seem to get it to work. If I try

df['Date'].isin(['Latitude'])

我得到:

Column<(Date IN (Latitude))>

有什么建议吗?

推荐答案

如果您更熟悉SQL语法，这是在filter()中使用pyspark-sql条件的另一种方法:

If you're more comfortable with SQL syntax, here is an alternative way using a pyspark-sql condition inside the filter():

df = df.filter("Date NOT IN (Latitude)")

或等效地使用

Or equivalently using pyspark.sql.DataFrame.where():

df = df.where("Date NOT IN (Latitude)")

这篇关于在pyspark中查找并删除匹配的列值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在pyspark中查找并删除匹配的列值 [英] Find and remove matching column values in pyspark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在pyspark中查找并删除匹配的列值 [英] Find and remove matching column values in pyspark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭