过滤火花数据帧的字符串包含 [英] Filter spark DataFrame on string contains
问题描述
我使用星火1.3.0和Spark的Avro 1.0.0
I am using Spark 1.3.0 and Spark Avro 1.0.0
我读张贴在这里的例子
https://github.com/databricks/spark-avro
这code效果很好。
val df = sqlContext.read.avro("src/test/resources/episodes.avro")
df.filter("doctor > 5").write.avro("/tmp/output")
但是,如果我需要看医生字符串包含一个子。因为我们正在编写一个字符串里面我们前pression。我该怎么办做了一个包含?
But what if I needed to see if doctor string contains a substring. Since we are writing our expression inside of a string. what do I do to do a "contains"?
推荐答案
您可以使用包含
(这适用于任意顺序):
You can use contains
(this works with an arbitrary sequence):
df.filter($"foo".contains("bar"))
像
(SQL像SQL简单的正前pression):
like
(SQL like with SQL simple regular expression):
df.filter($"foo".like("bar"))
或 RLIKE
(像Java正前pression):
or rlike
(like with Java regular expression):
df.filter($"foo".rlike("bar"))
根据您的要求。 LIKE
和 RLIKE
应与SQL前pressions正常工作。
depending on your requirements. LIKE
and RLIKE
should work with SQL expressions as well.
这篇关于过滤火花数据帧的字符串包含的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!