Spark Regexp:根据日期拆分列 [英] Spark Regexp: Split column based on date
问题描述
我的数据框中有一个名为数据"的列,如下所示:
I have a column, called "data", in my dataframe that looks like this:
{"blah:" blah," blah:" blah""10/7/17service
我想将其分为三个不同的列,如下所示:
I would like to separate this into three different columns that look like:
col1:{"blah:" blah," blah:" blah"col2:10/7/17col3:服务
我尝试过这种方法:
val split = df.withColumn("col1",regexp_extract($"data",(/(0(1 [9-9] | 1 [012]])[-\/.](0 [1-9] | [12] [0-9] | 3 [01])[-\/.](19 | 20)\ d \ d/),1).withColumn("col2",regexp_extract($"data",(/(0 [1-9] | 1 [012])[-\/.](0 [1-9] | [12] [0-9] | 3 [01])[-\/.](19 | 20)\ d \ d/),2))
但是这个正则表达式并不能真正让我通过.我觉得我缺少关于正则表达式运算符在Spark中的工作方式的一些信息.有什么想法吗?
But this regex doesn't really get me through the door. I feel like I'm missing something about how the regex operator works in Spark. Any ideas?
非常感谢!!:)
列的编辑规则:
- col1:日期值之前
- col2:日期值
- col3:在日期值之后