在pyspark中拆分列 [英] Splitting a column in pyspark

查看：940 发布时间：2020/9/4 7:04:12 python apache-spark pyspark

本文介绍了在pyspark中拆分列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试在pyspark中拆分数据帧这是我拥有的数据

I am trying to split a dataframe in pyspark This is the data i have

df = sc.parallelize([[1, 'Foo|10'], [2, 'Bar|11'], [3,'Car|12']]).toDF(['Key', 'Value'])
df = df.withColumn('Splitted', split(df['Value'], '|')[0])

我知道了

+-----+---------+-----+
|Key|Value|Splitted   |
+-----+---------+-----+
|    1|   Food|10|   F|
|    2|   Bar|11 |   B|
|    3|   Caring 12| C|
+-----+---------+-----+

但是我想要

+-----+---------+-----+
|Key  | Value|Splitted|
+-----+---------+-----+
|    1|   10|  Food   |
|    2|   11|  Bar    |
|    3|   12|Caring   |
+-----+---------+-----+

有人可以指出我做错了什么吗?

Can any one please point me to what i am doing wrong?

What if i have a unique situation like this?
df = sc.parallelize([[1, 'Foo|10|we'], [2, 'Bar|11|we'], [3,'Car|12|we']]).toDF(['Key', 'Value'])

+---+---------+
|Key|    Value|
+---+---------+
|  1|Foo|10|we|
|  2|Bar|11|we|
|  3|Car|12|we|
+---+---------+

推荐答案

您忘记了escape字符，应将转义字符添加为

You forgot the escape character, you should include escape character as

df = df.withColumn('Splitted', split(df['Value'], '\|')[0])

如果要输出为

+---+-----+--------+
|Key|Value|Splitted|
+---+-----+--------+
|1  |10   |Foo     |
|2  |11   |Bar     |
|3  |12   |Car     |
+---+-----+--------+

你应该做

from pyspark.sql import functions as F
df = df.withColumn('Splitted', F.split(df['Value'], '\|')).withColumn('Value', F.col('Splitted')[1]).withColumn('Splitted', F.col('Splitted')[0])

这篇关于在pyspark中拆分列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在pyspark中拆分列 [英] Splitting a column in pyspark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在pyspark中拆分列 [英] Splitting a column in pyspark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭