Pyspark:拆分并选择部分字符串列值 [英] Pyspark: Split and select part of the string column values
本文介绍了Pyspark:拆分并选择部分字符串列值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
如何从 spark DF 的列中选择 Dev\"
和 dev\
之后的字符或文件路径?
How can I select the characters or file path after the Dev\"
and dev\
from the column in a spark DF?
pyspark 列的示例行:
\\D\Dev\johnny\Desktop\TEST
\\D\Dev\matt\Desktop\TEST\NEW
\\D\Dev\matt\Desktop\TEST\OLD\TEST
\\E\dev\peter\Desktop\RUN\SUBFOLDER\New
预期产出
johnny\Desktop\TEST
matt\Desktop\TEST\NEW
matt\Desktop\TEST\OLD\TEST
peter\Desktop\RUN\SUBFOLDER\New
我尝试使用下面的代码.
I tried to use the code below.
df = df.withColumn(
"sub_path",
F.element_at(F.split(F.col("path"), "Dev\\\\"), -1)
)
它只给出我想要的部分正确结果.感谢有人可以提供帮助.
It's only giving the part correct results that I want. Appreciate someone can help.
推荐答案
以下修改[Dd]
匹配大写和小写d
.
The following modification [Dd]
matches both upper and lower case d
.
df = df.withColumn(
"sub_path",
F.element_at(F.split(F.col("path"), "[Dd]ev\\\\"), -1)
)
告诉我这是否适合您.
这篇关于Pyspark:拆分并选择部分字符串列值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文