Spark SQL不区分大小写的列条件过滤器 [英] Spark SQL case insensitive filter for column conditions
问题描述
如何使用Spark SQL过滤器作为不区分大小写的过滤器.
How to use Spark SQL filter as a case insensitive filter.
例如:
dataFrame.filter(dataFrame.col("vendor").equalTo("fortinet"));
只返回'vendor'
列等于'fortinet'
的行,但是我希望'vendor'
列等于'fortinet'
或'Fortinet'
或'foRtinet'
或...的行.
just return rows that 'vendor'
column is equal to 'fortinet'
but i want rows that 'vendor'
column equal to 'fortinet'
or 'Fortinet'
or 'foRtinet'
or ...
推荐答案
您可以使用不区分大小写的正则表达式:
You can either use case-insensitive regex:
val df = sc.parallelize(Seq(
(1L, "Fortinet"), (2L, "foRtinet"), (3L, "foo")
)).toDF("k", "v")
df.where($"v".rlike("(?i)^fortinet$")).show
// +---+--------+
// | k| v|
// +---+--------+
// | 1|Fortinet|
// | 2|foRtinet|
// +---+--------+
或与lower
/upper
的简单相等:
import org.apache.spark.sql.functions.{lower, upper}
df.where(lower($"v") === "fortinet")
// +---+--------+
// | k| v|
// +---+--------+
// | 1|Fortinet|
// | 2|foRtinet|
// +---+--------+
df.where(upper($"v") === "FORTINET")
// +---+--------+
// | k| v|
// +---+--------+
// | 1|Fortinet|
// | 2|foRtinet|
// +---+--------+
对于简单的过滤器,我宁愿选择rlike
,尽管性能应该相似,但对于join
条件,相等是一个更好的选择.请参见如何使用SQL风格的"LIKE"来联接两个Spark SQL数据帧.条件?以获取详细信息.
For simple filters I would prefer rlike
although performance should be similar, for join
conditions equality is a much better choice. See How can we JOIN two Spark SQL dataframes using a SQL-esque "LIKE" criterion? for details.
这篇关于Spark SQL不区分大小写的列条件过滤器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!