pySpark从列表中添加列 [英] pySpark adding columns from a list

查看:64
本文介绍了pySpark从列表中添加列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据名人堂,并希望根据列表中的值向其中添加列.

I have a datafame and would like to add columns to it, based on values from a list.

我的值列表将在3-50个值之间变化.我是pySpark的新手,正在尝试将这些值作为新列(空)附加到df中.

The list of my values will vary from 3-50 values. I'm new to pySpark and I'm trying to append these values as new columns (empty) to my df.

我看过推荐的代码,该代码如何将[一列] [1]添加到数据框,但不能从列表中添加多个.

I've seen recommended code of how to add [one column][1] to a dataframe but not multiple from a list.

mylist = ['ConformedLeaseRecoveryTypeId', 'ConformedLeaseStatusId', 'ConformedLeaseTypeId', 'ConformedLeaseRecoveryTypeName', 'ConformedLeaseStatusName', 'ConformedLeaseTypeName']

我下面的代码仅追加一列.

My code below only appends one column.

for new_col in mylist:
  new = datasetMatchedDomains.withColumn(new_col,f.lit(0))
new.show()




  [1]: https://stackoverflow.com/questions/48164206/pyspark-adding-a-column-from-a-list-of-values-using-a-udf

推荐答案

我们也可以将 列表理解 一起使用.选择将新列添加到数据框.

We can also use list comprehension with .select to add new columns to the dataframe.

示例:

Example:

#sample dataframe
df.show()
#+---+-----+---+---+----+
#| _1|   _2| _3| _4|  _5|
#+---+-----+---+---+----+
#|   |12343|   |9  |   0|
#+---+-----+---+---+----+

mylist = ['ConformedLeaseRecoveryTypeId', 'ConformedLeaseStatusId', 'ConformedLeaseTypeId', 'ConformedLeaseRecoveryTypeName', 'ConformedLeaseStatusName', 'ConformedLeaseTypeName']

cols=[col(col_name) for col_name in df.columns] + [(lit(0)).name( col_name) for col_name in mylist]

#incase if you want to cast new fields then
cols=[col(col_name) for col_name in df.columns] + [(lit(0).cast("string")).name( col_name) for col_name in mylist]

#adding new columns and selecting existing columns    
df.select(cols).show()
#+---+-----+---+---+----+----------------------------+----------------------+--------------------+------------------------------+------------------------+----------------------+
#| _1|   _2| _3| _4|  _5|ConformedLeaseRecoveryTypeId|ConformedLeaseStatusId|ConformedLeaseTypeId|ConformedLeaseRecoveryTypeName|ConformedLeaseStatusName|ConformedLeaseTypeName|
#+---+-----+---+---+----+----------------------------+----------------------+--------------------+------------------------------+------------------------+----------------------+
#|   |12343|   |9  |   0|                           0|                     0|                   0|                             0|                       0|                     0|
#+---+-----+---+---+----+----------------------------+----------------------+--------------------+------------------------------+------------------------+----------------------+

这篇关于pySpark从列表中添加列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆