如何在新列中将Pandas DataFrame的行上的迭代结果存储在新列中? [英] How to store the results of an iteration over rows of a Pandas DataFrame in a new column?
问题描述
我是Python编码的新手.目前,我正在尝试分析包含多个工作流程的数据框.每个工作流程都有用于启动和结束工作流程的不同处理步骤.在简化版本中,我的数据如下所示:
I am new to coding in Python. Currently, I am trying to analyse a dataframe containing multiple workflows. Each workflow has a different process steps for initiating and ending a workflow. In a simplified version, my data looks like the following:
Workflow Initiate End_1 End_2 End_3
0 1 Name_1 na Name_1 na
1 2 Name_2 na na na
2 3 Name_3 na na Name_5
3 4 Name_4 Name_5 na na
4 5 na na na Name_5
对于每个工作流程,我想比较结束工作流程的名称和启动工作流程的名称是否不同.
For every workflow, I want to compare whether the name that ended the workflow is different as the name that initiated the workflow.
以以下方式遍历行可为我提供所需的输出:
Iterating through the rows in the following way gives me the desired output:
for index, row in df.iterrows():
if ((row['Initiate'] != 'na')
and (row['Initiate'] == row['End_1']) |
(row['Initiate'] == row['End_2']) |
(row['Initiate'] == row['End_3'])
):
print("Name end equals initiate")
elif ((row['End_1'] == 'na') &
(row['End_2'] == 'na') &
(row['End_3'] == 'na')
):
print("No name ended")
else:
print("Different name ended")
Name end equals initiate
No name ended
Different name ended
Different name ended
Different name ended
但是,我想在数据框中添加一个额外的列,称为分析",以显示每个工作流程背后的上述结果.
However, I want to add an extra column, say 'Analysis', in the dataframe that shows the above outcome behind every workflow.
为此,我将代码填充到一个函数中:
For this I stuffed the code into a function:
def function_name(a, b, c, d):
for index, row in df.iterrows():
if ((a != 'na')
and (a == b) |
(a == c) |
(a == d)
):
return "Name end equals initiate"
elif ((b == 'na') &
(c == 'na') &
(d == 'na')
):
return "No name ended"
else:
return "Different name ended"
df['Analysis'] = function_name(row['Initiate'],
row['End_1'],
row['End_2'],
row['End_3'])
print(df)
Workflow Initiate ... End_3 Analysis
0 1 Name_1 ... na Different name ended
1 2 Name_2 ... na Different name ended
2 3 Name_3 ... Name_5 Different name ended
3 4 Name_4 ... na Different name ended
4 5 na ... Name_5 Different name ended
您可以看到输出与第一个分析不同.我想在数据框中添加一个额外的列,该列为我提供与print语句相同的输出.
As you can see the output is different from the first analysis. I would like to add an extra column to my dataframe that gives me the same output as with the print statements.
推荐答案
您应该在此处避免按行循环.您的算法是矢量化的:
You should avoid row-wise loops here. Your algorithm is vectorisable:
df = df.replace('na', np.nan) # replace string 'na' with NaN for efficient processing
ends = df.filter(like='End') # filter by columns with 'End'
match = ends.ffill(1).iloc[:, -1] == df['Initiate'] # find last Name in each End
nulls = ends.isnull().all(1) # check which rows are all null
# apply vectorised conditional logic
df['Result'] = np.select([match, nulls], ['Name end equals initiate', 'No name ended'],
'Different name ended')
print(df)
Workflow Initiate End_1 End_2 End_3 Result
0 1 Name_1 NaN Name_1 NaN Name end equals initiate
1 2 Name_2 NaN NaN NaN No name ended
2 3 Name_3 NaN NaN Name_5 Different name ended
3 4 Name_4 Name_5 NaN NaN Different name ended
4 5 NaN NaN NaN Name_5 Different name ended
这篇关于如何在新列中将Pandas DataFrame的行上的迭代结果存储在新列中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!