在字符串上过滤 spark DataFrame 包含 [英] Filter spark DataFrame on string contains
问题描述
我正在使用 Spark 1.3.0 和Spark Avro 1.0.0.我正在使用 存储库页面上的示例.以下代码运行良好
I am using Spark 1.3.0 and Spark Avro 1.0.0. I am working from the example on the repository page. This following code works well
val df = sqlContext.read.avro("src/test/resources/episodes.avro")
df.filter("doctor > 5").write.avro("/tmp/output")
但是如果我需要查看 doctor
字符串是否包含子字符串怎么办?因为我们在字符串中编写我们的表达式.我该怎么做才能包含"?
But what if I needed to see if the doctor
string contains a substring? Since we are writing our expression inside of a string. What do I do to do a "contains"?
推荐答案
您可以使用 contains
(这适用于任意序列):
You can use contains
(this works with an arbitrary sequence):
df.filter($"foo".contains("bar"))
like
(SQL like with SQL 简单正则表达式,_
匹配任意字符,%
匹配任意序列):
like
(SQL like with SQL simple regular expression whith _
matching an arbitrary character and %
matching an arbitrary sequence):
df.filter($"foo".like("bar"))
或 rlike
(类似于 Java 正则表达式):
df.filter($"foo".rlike("bar"))
取决于您的要求.LIKE
和 RLIKE
也应该适用于 SQL 表达式.
depending on your requirements. LIKE
and RLIKE
should work with SQL expressions as well.
这篇关于在字符串上过滤 spark DataFrame 包含的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!