在 pyspark 中,如何向列添加/连接字符串? [英] In pyspark, how do you add/concat a string to a column?

查看:39
本文介绍了在 pyspark 中,如何向列添加/连接字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想向现有列添加一个字符串.例如, df['col1'] 的值为 '1', '2', '3' 等,我想连接字符串 '000'col1 的左边,所以我可以得到一列(新的或替换旧的无关紧要)作为 '0001', '0002', '0003'.

I would like to add a string to an existing column. For example, df['col1'] has values as '1', '2', '3' etc and I would like to concat string '000' on the left of col1 so I can get a column (new or replace the old one doesn't matter) as '0001', '0002', '0003'.

我想我应该使用 df.withColumn('col1', '000'+df['col1']) 但当然它不起作用,因为 pyspark 数据帧是不可变的?

I thought I should use df.withColumn('col1', '000'+df['col1']) but of course it does not work since pyspark dataframe are immutable?

这应该是一项简单的任务,但我没有在网上找到任何东西.希望有人能给我一些帮助!

This should be an easy task but i didn't find anything online. Hope someone can give me some help!

谢谢!

推荐答案

from pyspark.sql.functions import concat, col, lit


df.select(concat(col("firstname"), lit(" "), col("lastname"))).show(5)
+------------------------------+
|concat(firstname,  , lastname)|
+------------------------------+
|                Emanuel Panton|
|              Eloisa Cayouette|
|                   Cathi Prins|
|             Mitchel Mozdzierz|
|               Angla Hartzheim|
+------------------------------+
only showing top 5 rows

http://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html#module-pyspark.sql.functions

这篇关于在 pyspark 中,如何向列添加/连接字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆