Spark Regexp:根据日期拆分列 [英] Spark Regexp: Split column based on date

查看：84 发布时间：2021/4/8 19:46:16 regex scala apache-spark

本文介绍了Spark Regexp:根据日期拆分列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的数据框中有一个名为数据"的列，如下所示:

I have a column, called "data", in my dataframe that looks like this:

{"blah:" blah，" blah:" blah""10/7/17service

我想将其分为三个不同的列，如下所示:

I would like to separate this into three different columns that look like:

col1:{"blah:" blah，" blah:" blah"col2:10/7/17col3:服务

我尝试过这种方法:

val split = df.withColumn("col1"，regexp_extract($"data"，(/(0(1 [9-9] | 1 [012]])[-\/.](0 [1-9] | [12] [0-9] | 3 [01])[-\/.](19 | 20)\ d \ d/)，1).withColumn("col2"，regexp_extract($"data"，(/(0 [1-9] | 1 [012])[-\/.](0 [1-9] | [12] [0-9] | 3 [01])[-\/.](19 | 20)\ d \ d/)，2))

但是这个正则表达式并不能真正让我通过.我觉得我缺少关于正则表达式运算符在Spark中的工作方式的一些信息.有什么想法吗?

But this regex doesn't really get me through the door. I feel like I'm missing something about how the regex operator works in Spark. Any ideas?

非常感谢！！:)

列的编辑规则:

col1:日期值之前
col2:日期值
col3:在日期值之后

推荐答案

好的，当您

col1 :匹配直到找到最后一个"

col2 :匹配日期

col3 :字符串的其余部分

col1: Match until it finds the last "
col2: Match the date
col3: The rest of the string

您需要的正则表达式是:

The regex you need is:

/(.+")(\d{1,2}\/\d{1,2}\/\d{1,2})(.+)/

但是，当在 regexp_extract()函数上使用它时，必须转义反斜杠，因此对于每一列，您将使用:

However, when you use it on the regexp_extract() function, you must escape the backslashes, so for each column, you'll use:

regexp_extract($"data", "(.+\")(\\d{1,2}\\/\\d{1,2}\\/\\d{1,2})(.+)", N)

根据您编写的代码，尝试使用此代码:

Based on the code you wrote, try using this:

val separate = df.withColumn("col1", regexp_extract($"data", "(.+\")(\\d{1,2}\\/\\d{1,2}\\/\\d{1,2})(.+)", 1)).withColumn("col2",regexp_extract($"data", "(.+\")(\\d{1,2}\\/\\d{1,2}\\/\\d{1,2})(.+)", 1)).withColumn("col3",regexp_extract($"data", "(.+\")(\\d{1,2}\\/\\d{1,2}\\/\\d{1,2})(.+)", 3))

这篇关于Spark Regexp:根据日期拆分列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark Regexp:根据日期拆分列 [英] Spark Regexp: Split column based on date

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark Regexp:根据日期拆分列 [英] Spark Regexp: Split column based on date

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭