在字符串上过滤 spark DataFrame 包含 [英] Filter spark DataFrame on string contains

查看：36 发布时间：2021/11/12 5:42:53 scala apache-spark dataframe apache-spark-sql

本文介绍了在字符串上过滤 spark DataFrame 包含的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 Spark 1.3.0 和Spark Avro 1.0.0.我正在使用存储库页面上的示例.以下代码运行良好

I am using Spark 1.3.0 and Spark Avro 1.0.0. I am working from the example on the repository page. This following code works well

val df = sqlContext.read.avro("src/test/resources/episodes.avro")
df.filter("doctor > 5").write.avro("/tmp/output")

但是如果我需要查看 doctor 字符串是否包含子字符串怎么办?因为我们在字符串中编写我们的表达式.我该怎么做才能包含"?

But what if I needed to see if the doctor string contains a substring? Since we are writing our expression inside of a string. What do I do to do a "contains"?

推荐答案

您可以使用 contains(这适用于任意序列):

You can use contains (this works with an arbitrary sequence):

df.filter($"foo".contains("bar"))

like(SQL like with SQL 简单正则表达式，_ 匹配任意字符，% 匹配任意序列):

like (SQL like with SQL simple regular expression whith _ matching an arbitrary character and % matching an arbitrary sequence):

df.filter($"foo".like("bar"))

或 rlike(类似于 Java 正则表达式):

df.filter($"foo".rlike("bar"))

取决于您的要求.LIKE 和 RLIKE 也应该适用于 SQL 表达式.

depending on your requirements. LIKE and RLIKE should work with SQL expressions as well.

这篇关于在字符串上过滤 spark DataFrame 包含的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在字符串上过滤 spark DataFrame 包含 [英] Filter spark DataFrame on string contains

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在字符串上过滤 spark DataFrame 包含 [英] Filter spark DataFrame on string contains

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭