在pyspark中,如何将字符串添加/合并到列中? [英] In pyspark, how do you add/concat a string to a column?

查看:818
本文介绍了在pyspark中,如何将字符串添加/合并到列中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在现有列中添加一个字符串.例如,df['col1']的值为'1', '2', '3'等,我想在col1的左侧合并字符串'000',这样我就可以得到一列(新列或替换旧列都没有关系)为'0001', '0002', '0003'.

I would like to add a string to an existing column. For example, df['col1'] has values as '1', '2', '3' etc and I would like to concat string '000' on the left of col1 so I can get a column (new or replace the old one doesn't matter) as '0001', '0002', '0003'.

我以为我应该使用df.withColumn('col1', '000'+df['col1']),但是由于pyspark数据帧是不可变的,因此它当然不起作用吗?

I thought I should use df.withColumn('col1', '000'+df['col1']) but of course it does not work since pyspark dataframe are immutable?

这应该是一个简单的任务,但是我没有在网上找到任何东西.希望有人能给我一些帮助!

This should be an easy task but i didn't find anything online. Hope someone can give me some help!

谢谢!

推荐答案

from pyspark.sql.functions import concat, col, lit


df.select(concat(col("firstname"), lit(" "), col("lastname"))).show(5)
+------------------------------+
|concat(firstname,  , lastname)|
+------------------------------+
|                Emanuel Panton|
|              Eloisa Cayouette|
|                   Cathi Prins|
|             Mitchel Mozdzierz|
|               Angla Hartzheim|
+------------------------------+
only showing top 5 rows

http://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html#module-pyspark.sql.functions

这篇关于在pyspark中,如何将字符串添加/合并到列中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆