pandas 将条件与上一行进行比较 [英] Pandas compare value with previous row with filtration condition

查看：150 发布时间：2020/10/6 18:58:05 python pandas dataframe compare rows

本文介绍了 pandas 将条件与上一行进行比较的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个DataFrame，其中包含有关员工薪水的信息。大约有900000行以上。

I have a DataFrame with information about employee salary. It's about 900000+ rows.

示例：

+----+-------------+---------------+----------+
|    |   table_num | name          |   salary |
|----+-------------+---------------+----------|
|  0 |      001234 | John Johnson  |     1200 |
|  1 |      001234 | John Johnson  |     1000 |
|  2 |      001235 | John Johnson  |     1000 |
|  3 |      001235 | John Johnson  |     1200 |
|  4 |      001235 | John Johnson  |     1000 |
|  5 |      001235 | Steve Stevens |     1000 |
|  6 |      001236 | Steve Stevens |     1200 |
|  7 |      001236 | Steve Stevens |     1200 |
|  8 |      001236 | Steve Stevens |     1200 |
+----+-------------+---------------+----------+

dtypes：

table_num: string
name: string
salary: float

我需要添加一列有关薪水水平下降的信息。
我正在使用 shift（）函数比较行中的值。

I need to add a column with information about increased\decreased salary level. I'm using the shift() function to compare value in rows.

主要问题是整个数据集中所有唯一员工的过滤和迭代。

Main problem is in filtration and iteration over all unique employees over the whole dataset.

我的脚本大约需要3个半小时。

如何更快地完成操作？

我的脚本：

# giving us only unique combination of 'table_num' and 'name'
    # since there can be same 'table_num' for different 'name'
    # and same names with different 'table_num' appears sometimes

names_df = df[['table_num', 'name']].drop_duplicates()

# then extracting particular name and table_num from Series
for i in range(len(names_df)):    ### Bottleneck of whole script ###    
    t = names_df.iloc[i,[0,1]][0]
    n = names_df.iloc[i,[0,1]][1]

    # using shift() and lambda to check if there difference between two rows 
    diff_sal = (df[(df['table_num']==t)
               & ((df['name']==n))]['salary'] - df[(df['table_num']==t)
                                                 & ((df['name']==n))]['salary'].shift(1)).apply(lambda x: 1 if x>0 else (-1 if x<0 else 0))
    df.loc[diff_sal.index, 'inc'] = diff_sal.values

样本输入数据：

df = pd.DataFrame({'table_num': ['001234','001234','001235','001235','001235','001235','001236','001236','001236'], 
                     'name': ['John Johnson','John Johnson','John Johnson','John Johnson','John Johnson', 'Steve Stevens', 'Steve Stevens', 'Steve Stevens', 'Steve Stevens'], 
                     'salary':[1200.,1000.,1000.,1200.,1000.,1000.,1200.,1200.,1200.]})

样本输出：

+----+-------------+---------------+----------+-------+
|    |   table_num | name          |   salary |   inc |
|----+-------------+---------------+----------+-------|
|  0 |      001234 | John Johnson  |     1200 |     0 |
|  1 |      001234 | John Johnson  |     1000 |    -1 |
|  2 |      001235 | John Johnson  |     1000 |     0 |
|  3 |      001235 | John Johnson  |     1200 |     1 |
|  4 |      001235 | John Johnson  |     1000 |    -1 |
|  5 |      001235 | Steve Stevens |     1000 |     0 |
|  6 |      001236 | Steve Stevens |     1200 |     0 |
|  7 |      001236 | Steve Stevens |     1200 |     0 |
|  8 |      001236 | Steve Stevens |     1200 |     0 |
+----+-------------+---------------+----------+-------+

pandas 将条件与上一行进行比较 [英] Pandas compare value with previous row with filtration condition

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas 将条件与上一行进行比较 [英] Pandas compare value with previous row with filtration condition

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭