Spark 数据帧过滤器 [英] Spark dataframe filter

查看：34 发布时间：2021/11/14 22:21:06 scala apache-spark apache-spark-sql

本文介绍了Spark 数据帧过滤器的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

val df = sc.parallelize(Seq((1,"Emailab"), (2,"Phoneab"), (3, "Faxab"),(4,"Mail"),(5,"Other"),(6,"MSL12"),(7,"MSL"),(8,"HCP"),(9,"HCP12"))).toDF("c1","c2")

+---+-------+
| c1|     c2|
+---+-------+
|  1|Emailab|
|  2|Phoneab|
|  3|  Faxab|
|  4|   Mail|
|  5|  Other|
|  6|  MSL12|
|  7|    MSL|
|  8|    HCP|
|  9|  HCP12|
+---+-------+

我想过滤掉包含c2"列的前 3 个字符(MSL"或HCP")的记录.

I want to filter out records which have first 3 characters of column 'c2' either 'MSL' or 'HCP'.

所以输出应该如下所示.

So the output should be like below.

+---+-------+
| c1|     c2|
+---+-------+
|  1|Emailab|
|  2|Phoneab|
|  3|  Faxab|
|  4|   Mail|
|  5|  Other|
+---+-------+

有人可以帮忙吗?

我知道 df.filter($"c2".rlike("MSL")) -- 这是用于选择记录但如何排除记录.?

I knew that df.filter($"c2".rlike("MSL")) -- This is for selecting the records but how to exclude the records. ?

版本:Spark 1.6.2斯卡拉:2.10

Version: Spark 1.6.2 Scala : 2.10

Spark 数据帧过滤器 [英] Spark dataframe filter

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark 数据帧过滤器 [英] Spark dataframe filter

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭