拆分 PySpark 数据框中字符串列的内容 [英] Split Contents of String column in PySpark Dataframe

查看:23
本文介绍了拆分 PySpark 数据框中字符串列的内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 pyspark 数据框,其中有一列包含字符串.我想将此列拆分为单词

代码:

<预><代码>>>>sentenceData = sqlContext.read.load('file://sample1.csv', format='com.databricks.spark.csv', header='true', inferSchema='true')>>>句子数据.show(截断=假)+---+---------------------------+|键|降序|+---+---------------------------+|1 |维拉特是个好击球手 ||2 |sachin 很好 ||3 |但莫迪很烂||4 |我喜欢公式|+---+---------------------------+预期产出--------------->>>句子数据.show(截断=假)+---+--------------------------------------+|键|降序|+---+--------------------------------------+|1 |[Virat,is,good,batsman] ||2 |[sachin,was,good] ||3 |.... ||4 |... |+---+--------------------------------------+

我怎样才能做到这一点?

解决方案

使用split函数:

from pyspark.sql.functions import splitdf.withColumn("desc", split("desc", "\s+"))

I have a pyspark data frame whih has a column containing strings. I want to split this column into words

Code:

>>> sentenceData = sqlContext.read.load('file://sample1.csv', format='com.databricks.spark.csv', header='true', inferSchema='true')
>>> sentenceData.show(truncate=False)
+---+---------------------------+
|key|desc                       |
+---+---------------------------+
|1  |Virat is good batsman      |
|2  |sachin was good            |
|3  |but modi sucks big big time|
|4  |I love the formulas        |
+---+---------------------------+


Expected Output
---------------

>>> sentenceData.show(truncate=False)
+---+-------------------------------------+
|key|desc                                 |
+---+-------------------------------------+
|1  |[Virat,is,good,batsman]              |
|2  |[sachin,was,good]                    |
|3  |....                                 |
|4  |...                                  |
+---+-------------------------------------+

How can I achieve this?

解决方案

Use split function:

from pyspark.sql.functions import split

df.withColumn("desc", split("desc", "\s+"))

这篇关于拆分 PySpark 数据框中字符串列的内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆