使用pySpark填充数据框中的列 [英] Populating column in dataframe with pySpark
本文介绍了使用pySpark填充数据框中的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
是pySpark的新功能,我正在尝试根据条件使用列表填充一列。如何使用列表填充基于列的条件?
new to pySpark and I'm trying to fill a column based on conditions using a list. How can I fill a column based conditions using a list?
Python逻辑
if matchedPortfolios == 0:
print("ALL")
else:
print(Portfolios)
pySpark尝试出错
#Check matching column values in order to find common portfolio names
Portfolios = set (portfolio_DomainItemLookup) & set(portfolio_dataset_standardFalse)
Portfolios #prints list of matched names OR prints empty list
matchedPortfolios = len(Portfolios)
matchedPortfolios #prints 0 or length of list
dataset_standardFalse.withColumn('PortfolioRule', f.when( matchedPortfolios == 0, "ALL").otherwise(Portfolios)).show()
TypeError:条件应为列:变量matchPortfolios为列表。如何使用列表填充基于列的条件?
TypeError: condition should be a Column: Variable matchedPortfolios is a list. How can I fill a column based conditions using a list?
我当前的数据框
|SourceSystemName| Portfolio|PortfolioRule|
+----------------+----------------+-------------+
| ABCorp| ABC Portfolio| null|
| ABCorp| ABC Portfolio| null|
| ABCorp| ABC Portfolio| null|
预期结果
if matchedPortfolios == 0 logic
+----------------+----------------+-------------+
|SourceSystemName| Portfolio|PortfolioRule|
+----------------+----------------+-------------+
| ABCorp| ABC Portfolio| ALL |
| ABCorp| ABC Portfolio| ALL |
| ABCorp| ABC Portfolio| ALL |
else logic
+----------------+----------------+--------------+
|SourceSystemName| Portfolio|PortfolioRule |
+----------------+----------------+--------------+
| ABCorp| ABC Portfolio|ABC Portfolio |
| ABCorp| ABC Portfolio|ABC Portfolio |
| ABCorp| ABC Portfolio|ABC Portfolio |
推荐答案
这将填充列投资组合不匹配的地方。如果存在匹配项,它将是投资组合列中的直接副本
This fills columns where Portfolio has no match. And if there is a match, it will be a straight copy from the Portfolio column
new = dataset_standardFalse.withColumn('PortfolioRule',f.when(dataset_standardFalse['Portfolio'].isin(Portfolios), dataset_standardFalse['Portfolio']).otherwise('ALL'))
display(new)
这篇关于使用pySpark填充数据框中的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文