pySpark从列表中添加列 [英] pySpark adding columns from a list
本文介绍了pySpark从列表中添加列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个数据名人堂,并希望根据列表中的值向其中添加列.
I have a datafame and would like to add columns to it, based on values from a list.
我的值列表将在3-50个值之间变化.我是pySpark的新手,正在尝试将这些值作为新列(空)附加到df中.
The list of my values will vary from 3-50 values. I'm new to pySpark and I'm trying to append these values as new columns (empty) to my df.
我看过推荐的代码,该代码如何将[一列] [1]添加到数据框,但不能从列表中添加多个.
I've seen recommended code of how to add [one column][1] to a dataframe but not multiple from a list.
mylist = ['ConformedLeaseRecoveryTypeId', 'ConformedLeaseStatusId', 'ConformedLeaseTypeId', 'ConformedLeaseRecoveryTypeName', 'ConformedLeaseStatusName', 'ConformedLeaseTypeName']
我下面的代码仅追加一列.
My code below only appends one column.
for new_col in mylist:
new = datasetMatchedDomains.withColumn(new_col,f.lit(0))
new.show()
[1]: https://stackoverflow.com/questions/48164206/pyspark-adding-a-column-from-a-list-of-values-using-a-udf
推荐答案
我们也可以将 列表理解
与 一起使用.选择
将新列添加到数据框.
We can also use list comprehension
with .select
to add new columns to the dataframe.
示例:
Example:
#sample dataframe
df.show()
#+---+-----+---+---+----+
#| _1| _2| _3| _4| _5|
#+---+-----+---+---+----+
#| |12343| |9 | 0|
#+---+-----+---+---+----+
mylist = ['ConformedLeaseRecoveryTypeId', 'ConformedLeaseStatusId', 'ConformedLeaseTypeId', 'ConformedLeaseRecoveryTypeName', 'ConformedLeaseStatusName', 'ConformedLeaseTypeName']
cols=[col(col_name) for col_name in df.columns] + [(lit(0)).name( col_name) for col_name in mylist]
#incase if you want to cast new fields then
cols=[col(col_name) for col_name in df.columns] + [(lit(0).cast("string")).name( col_name) for col_name in mylist]
#adding new columns and selecting existing columns
df.select(cols).show()
#+---+-----+---+---+----+----------------------------+----------------------+--------------------+------------------------------+------------------------+----------------------+
#| _1| _2| _3| _4| _5|ConformedLeaseRecoveryTypeId|ConformedLeaseStatusId|ConformedLeaseTypeId|ConformedLeaseRecoveryTypeName|ConformedLeaseStatusName|ConformedLeaseTypeName|
#+---+-----+---+---+----+----------------------------+----------------------+--------------------+------------------------------+------------------------+----------------------+
#| |12343| |9 | 0| 0| 0| 0| 0| 0| 0|
#+---+-----+---+---+----+----------------------------+----------------------+--------------------+------------------------------+------------------------+----------------------+
这篇关于pySpark从列表中添加列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文