使用pySpark填充数据框中的列 [英] Populating column in dataframe with pySpark

查看:244
本文介绍了使用pySpark填充数据框中的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是pySpark的新功能,我正在尝试根据条件使用列表填充一列。如何使用列表填充基于列的条件?

new to pySpark and I'm trying to fill a column based on conditions using a list. How can I fill a column based conditions using a list?

Python逻辑

if  matchedPortfolios == 0:
      print("ALL")
  else:
      print(Portfolios)

pySpark尝试出错


#Check matching column values in order to find common portfolio names
Portfolios = set (portfolio_DomainItemLookup) & set(portfolio_dataset_standardFalse)
Portfolios #prints list of matched names OR prints  empty list

matchedPortfolios = len(Portfolios)
matchedPortfolios #prints 0 or length of list


dataset_standardFalse.withColumn('PortfolioRule', f.when( matchedPortfolios == 0, "ALL").otherwise(Portfolios)).show()

TypeError:条件应为列:变量matchPortfolios为列表。如何使用列表填充基于列的条件?

TypeError: condition should be a Column: Variable matchedPortfolios is a list. How can I fill a column based conditions using a list?

我当前的数据框

|SourceSystemName|       Portfolio|PortfolioRule|
+----------------+----------------+-------------+
|          ABCorp|   ABC Portfolio|         null|   
|          ABCorp|   ABC Portfolio|         null|  
|          ABCorp|   ABC Portfolio|         null|

预期结果

if matchedPortfolios == 0 logic 
+----------------+----------------+-------------+
|SourceSystemName|       Portfolio|PortfolioRule|
+----------------+----------------+-------------+
|          ABCorp|   ABC Portfolio|         ALL |   
|          ABCorp|   ABC Portfolio|         ALL |  
|          ABCorp|   ABC Portfolio|         ALL |


else logic
+----------------+----------------+--------------+
|SourceSystemName|       Portfolio|PortfolioRule |
+----------------+----------------+--------------+
|          ABCorp|   ABC Portfolio|ABC Portfolio |   
|          ABCorp|   ABC Portfolio|ABC Portfolio |  
|          ABCorp|   ABC Portfolio|ABC Portfolio |



推荐答案

这将填充列投资组合不匹配的地方。如果存在匹配项,它将是投资组合列中的直接副本

This fills columns where Portfolio has no match. And if there is a match, it will be a straight copy from the Portfolio column

new = dataset_standardFalse.withColumn('PortfolioRule',f.when(dataset_standardFalse['Portfolio'].isin(Portfolios), dataset_standardFalse['Portfolio']).otherwise('ALL')) 
display(new)

这篇关于使用pySpark填充数据框中的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆