Spark-在哪里选择或过滤? [英] Spark - SELECT WHERE or filtering?
本文介绍了Spark-在哪里选择或过滤?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
使用where子句进行选择与在Spark中进行过滤之间有什么区别?
是否存在用例比另一种更合适的用例?
What's the difference between selecting with a where clause and filtering in Spark?
Are there any use cases in which one is more appropriate than the other one?
我什么时候使用
DataFrame newdf = df.select(df.col("*")).where(df.col("somecol").leq(10))
何时是
DataFrame newdf = df.select(df.col("*")).filter("somecol <= 10")
更合适吗?
推荐答案
According to spark documentation "where()
is an alias for filter()
"
filter(condition)
使用给定条件过滤行.
where()
是filter()
的别名.
filter(condition)
Filters rows using the given condition.
where()
is an alias for filter()
.
参数:条件– Column
为types.BooleanType
或SQL字符串.
Parameters: condition – a Column
of types.BooleanType
or a string of SQL expression.
>>> df.filter(df.age > 3).collect()
[Row(age=5, name=u'Bob')]
>>> df.where(df.age == 2).collect()
[Row(age=2, name=u'Alice')]
>>> df.filter("age > 3").collect()
[Row(age=5, name=u'Bob')]
>>> df.where("age = 2").collect()
[Row(age=2, name=u'Alice')]
这篇关于Spark-在哪里选择或过滤?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文