在pyspark中根据另一列的值拆分一列 [英] Split one column based the value of another column in pyspark

查看：356 发布时间：2020/9/4 19:54:21 apache-spark pyspark pyspark-sql

本文介绍了在pyspark中根据另一列的值拆分一列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下数据框

+----+-------+
|item|   path|
+----+-------+
|   a|  a/b/c|
|   b|  e/b/f|
|   d|e/b/d/h|
|   c|  g/h/c|
+----+-------+

我想通过找到列'path'中的值并提取路径的LHS来找到列 "item" 中an的相对路径，如下所示

I want to find relative path of an of the column "item" by locating its value in column 'path' and extracting the path's LHS as shown below

+----+-------+--------+
|item|   path|rel_path|
+----+-------+--------+
|   a|  a/b/c|       a|
|   b|  e/b/f|     e/b|
|   d|e/b/d/h|   e/b/d|
|   c|  g/h/c|   g/h/c|
+----+-------+--------+

我尝试使用函数 split((str, pattern) 或regexp_extract(str, pattern, idx)，但不确定如何将列'item'的值传递到其模式部分.知道不编写函数怎么办?

I tried to use functions split((str, pattern) or regexp_extract(str, pattern, idx) but not sure how to pass the value of column 'item' into their pattern section . Any idea how that could be done without writing a function?

推荐答案

您可以使用 pyspark.sql.functions.expr 到将列值作为参数传递给 regexp_replace .在这里，您需要将item的负向后缀与.+连接起来，以匹配之后的所有内容，并替换为空字符串.

You can use pyspark.sql.functions.expr to pass a column value as a parameter to regexp_replace. Here you need to concatenate the a negative lookbehind for item with .+ to match everything after, and replace with an empty string.

from pyspark.sql.functions import expr

df.withColumn(
    "rel_path", 
    expr("regexp_replace(path, concat('(?<=',item,').+'), '')")
).show()
#+----+-------+--------+
#|item|   path|rel_path|
#+----+-------+--------+
#|   a|  a/b/c|       a|
#|   b|  e/b/f|     e/b|
#|   d|e/b/d/h|   e/b/d|
#|   c|  g/h/c|   g/h/c|
#+----+-------+--------+

这篇关于在pyspark中根据另一列的值拆分一列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在pyspark中根据另一列的值拆分一列 [英] Split one column based the value of another column in pyspark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在pyspark中根据另一列的值拆分一列 [英] Split one column based the value of another column in pyspark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭