如何将 pyspark 数据框列拆分为仅两列(以下示例)? [英] How to split a pyspark dataframe column into only two columns (example below)?

查看：28 发布时间：2021/11/14 22:51:43 apache-spark pyspark split apache-spark-sql

本文介绍了如何将 pyspark 数据框列拆分为仅两列(以下示例)?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

该列在一行中多次使用分隔符，因此 split 并不那么简单.
在拆分时，在这种情况下只需要考虑 第一个分隔符 的出现.

The column has multiple usage of the delimiter in a single row, hence split is not as straightforward.
Upon splitting, only the 1st delimiter occurrence has to be considered in this case.

截至目前，我正在这样做.

As of now, I am doing this.

但是，我觉得可以有更好的解决方案?

testdf= spark.createDataFrame([("Dog", "meat,bread,milk"), ("Cat", "mouse,fish")],["Animal", "Food"])

testdf.show()

+------+---------------+
|Animal|           Food|
+------+---------------+
|   Dog|meat,bread,milk|
|   Cat|     mouse,fish|
+------+---------------+

testdf.withColumn("Food1", split(col("Food"), ",").getItem(0))\
        .withColumn("Food2",expr("regexp_replace(Food, Food1, '')"))\
        .withColumn("Food2",expr("substring(Food2, 2)")).show()

+------+---------------+-----+----------+
|Animal|           Food|Food1|     Food2|
+------+---------------+-----+----------+
|   Dog|meat,bread,milk| meat|bread,milk|
|   Cat|     mouse,fish|mouse|      fish|
+------+---------------+-----+----------+

推荐答案

一种使用正则表达式从列表中只拆分第一次出现的方法

An approach using regular expression to split only first occurrence from the list

testdf.withColumn('Food1',f.split('Food',"(?<=^[^,]*)\\,")[0]).\
       withColumn('Food2',f.split('Food',"(?<=^[^,]*)\\,")[1]).show()

+------+---------------+-----+----------+
|Animal|           Food|Food1|     Food2|
+------+---------------+-----+----------+
|   Dog|meat,bread,milk| meat|bread,milk|
|   Cat|     mouse,fish|mouse|      fish|
+------+---------------+-----+----------+

这篇关于如何将 pyspark 数据框列拆分为仅两列(以下示例)?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何将 pyspark 数据框列拆分为仅两列(以下示例)? [英] How to split a pyspark dataframe column into only two columns (example below)?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何将 pyspark 数据框列拆分为仅两列(以下示例)? [英] How to split a pyspark dataframe column into only two columns (example below)?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭