Pyspark:拆分并选择部分字符串列值 [英] Pyspark: Split and select part of the string column values

查看:57
本文介绍了Pyspark:拆分并选择部分字符串列值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何从 spark DF 的列中选择 Dev\"dev\ 之后的字符或文件路径?

How can I select the characters or file path after the Dev\" and dev\ from the column in a spark DF?

pyspark 列的示例行:

\\D\Dev\johnny\Desktop\TEST
\\D\Dev\matt\Desktop\TEST\NEW
\\D\Dev\matt\Desktop\TEST\OLD\TEST
\\E\dev\peter\Desktop\RUN\SUBFOLDER\New

预期产出

johnny\Desktop\TEST
matt\Desktop\TEST\NEW
matt\Desktop\TEST\OLD\TEST
peter\Desktop\RUN\SUBFOLDER\New

我尝试使用下面的代码.

I tried to use the code below.

df = df.withColumn(
        "sub_path",
        F.element_at(F.split(F.col("path"), "Dev\\\\"), -1)
    )

它只给出我想要的部分正确结果.感谢有人可以提供帮助.

It's only giving the part correct results that I want. Appreciate someone can help.

推荐答案

以下修改[Dd]匹配大写和小写d.

The following modification [Dd] matches both upper and lower case d.

df = df.withColumn(
        "sub_path",
        F.element_at(F.split(F.col("path"), "[Dd]ev\\\\"), -1)
    )

告诉我这是否适合您.

这篇关于Pyspark:拆分并选择部分字符串列值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆