如何在新列中将Pandas DataFrame的行上的迭代结果存储在新列中? [英] How to store the results of an iteration over rows of a Pandas DataFrame in a new column?

查看:110
本文介绍了如何在新列中将Pandas DataFrame的行上的迭代结果存储在新列中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Python编码的新手.目前,我正在尝试分析包含多个工作流程的数据框.每个工作流程都有用于启动和结束工作流程的不同处理步骤.在简化版本中,我的数据如下所示:

I am new to coding in Python. Currently, I am trying to analyse a dataframe containing multiple workflows. Each workflow has a different process steps for initiating and ending a workflow. In a simplified version, my data looks like the following:

   Workflow Initiate   End_1   End_2   End_3
0         1   Name_1      na  Name_1      na
1         2   Name_2      na      na      na
2         3   Name_3      na      na  Name_5
3         4   Name_4  Name_5      na      na
4         5       na      na      na  Name_5

对于每个工作流程,我想比较结束工作流程的名称和启动工作流程的名称是否不同.

For every workflow, I want to compare whether the name that ended the workflow is different as the name that initiated the workflow.

以以下方式遍历行可为我提供所需的输出:

Iterating through the rows in the following way gives me the desired output:

for index, row in df.iterrows():
    if ((row['Initiate'] != 'na')
        and (row['Initiate'] == row['End_1']) |
            (row['Initiate'] == row['End_2']) |
            (row['Initiate'] == row['End_3'])
        ):
        print("Name end equals initiate")
    elif ((row['End_1'] == 'na') &
          (row['End_2'] == 'na') &
          (row['End_3'] == 'na')
         ):
        print("No name ended")
    else:
        print("Different name ended")

Name end equals initiate
No name ended
Different name ended
Different name ended
Different name ended

但是,我想在数据框中添加一个额外的列,称为分析",以显示每个工作流程背后的上述结果.

However, I want to add an extra column, say 'Analysis', in the dataframe that shows the above outcome behind every workflow.

为此,我将代码填充到一个函数中:

For this I stuffed the code into a function:

def function_name(a, b, c, d):
    for index, row in df.iterrows():
        if ((a != 'na')
            and (a == b) |
                (a == c) |
                (a == d)
            ):
            return "Name end equals initiate"
        elif ((b == 'na') &
              (c == 'na') &
              (d == 'na')
             ):
            return "No name ended"
        else:
            return "Different name ended"

df['Analysis'] = function_name(row['Initiate'],
                               row['End_1'],
                               row['End_2'],
                               row['End_3'])

print(df)

   Workflow Initiate          ...            End_3              Analysis
0         1   Name_1          ...               na  Different name ended
1         2   Name_2          ...               na  Different name ended
2         3   Name_3          ...           Name_5  Different name ended
3         4   Name_4          ...               na  Different name ended
4         5       na          ...           Name_5  Different name ended

您可以看到输出与第一个分析不同.我想在数据框中添加一个额外的列,该列为我提供与print语句相同的输出.

As you can see the output is different from the first analysis. I would like to add an extra column to my dataframe that gives me the same output as with the print statements.

推荐答案

您应该在此处避免按行循环.您的算法是矢量化的:

You should avoid row-wise loops here. Your algorithm is vectorisable:

df = df.replace('na', np.nan)  # replace string 'na' with NaN for efficient processing
ends = df.filter(like='End')  # filter by columns with 'End'

match = ends.ffill(1).iloc[:, -1] == df['Initiate']  # find last Name in each End
nulls = ends.isnull().all(1)  # check which rows are all null

# apply vectorised conditional logic
df['Result'] = np.select([match, nulls], ['Name end equals initiate', 'No name ended'],
                         'Different name ended')

print(df)

   Workflow Initiate   End_1   End_2   End_3                    Result
0         1   Name_1     NaN  Name_1     NaN  Name end equals initiate
1         2   Name_2     NaN     NaN     NaN             No name ended
2         3   Name_3     NaN     NaN  Name_5      Different name ended
3         4   Name_4  Name_5     NaN     NaN      Different name ended
4         5      NaN     NaN     NaN  Name_5      Different name ended

这篇关于如何在新列中将Pandas DataFrame的行上的迭代结果存储在新列中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆