跨行的python python标志事务 [英] pandas python flag transactions across rows

查看:102
本文介绍了跨行的python python标志事务的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有如下数据.我想标记交易-

I have a data as below. I would like to flag transactions -

-因此,在这种情况下,员工ac的交易将突出显示.员工b的交易不符合条件,因此不会突出显示-他没有'Car Mileage'

when a same employee has one of the ('Car Rental', 'Car Rental - Gas' in the column expense type) and 'Car Mileage' on the same day - so in this case employee a and c's transactions would be highlighted. Employee b's transactions won't be highlighted as they don't meet the condition - he doesn't have a 'Car Mileage'

我想要列zflag.该列中的不同数字表示满足以上条件的实例组

i want the column zflag. Different numbers in that column indicate group of instances when the above condition was met

d = {'emp': ['a',   'a',    'a',    'a',    'b',    'b',    'b',    'c',    'c',    'c',    'c' ], 
 'date': ['1',  '1',    '1',    '1',    '2',    '2',    '2',    '3',    '3',    '3',    '3' ], 
 'usd':[1,  2,  3,  4,  5,  6,  7,  8,  9,  10,     11 ], 
 'expense type':['Car Mileage',     'Car Rental',   'Car Rental - Gas',     'food',     'Car Rental',   'Car Rental - Gas',     'food',     'Car Mileage',  'Car Rental',   'food',     'wine' ],
 'zflag':['1',  '1', '1',   ' ',' ',' ',' ','2','2',' ',' ' ]
 }

df = pd.DataFrame(data=d)



    df
Out[253]: 
   date emp      expense type  usd zflag
0     1   a       Car Mileage    1     1
1     1   a        Car Rental    2     1
2     1   a  Car Rental - Gas    3     1
3     1   a              food    4      
4     2   b        Car Rental    5      
5     2   b  Car Rental - Gas    6      
6     2   b              food    7      
7     3   c       Car Mileage    8     2
8     3   c        Car Rental    9     2
9     3   c              food   10      
10    3   c              wine   11      

如果能获得有关要使用的功能的指针,我将不胜感激.我正在考虑使用groupby ...但不确定

I would appreciate if i could get pointers regarding functions to use. I am thinking of using groupby...but not sure

我知道date + emp将是我的主键

I understand that date+emp will be my primary key

推荐答案

这里是一种方法.它不是最干净的,但是您所描述的是非常具体的.其中一些功能可以通过功能进行简化.

Here is an approach. It's not the cleanest but what you're describing is quite specific. Some of this might be able to be simplified with a function.

temp_df = df.groupby(["emp", "date"], axis=0)["expense type"].apply(lambda x: 1 if "Car Mileage" in x.values and any([k in x.values for k in ["Car Rental", "Car Rental - Gas"]]) else 0).rename("zzflag")
temp_df = temp_df.loc[temp_df!=0,:].cumsum()
final_df = pd.merge(df, temp_df.reset_index(), how="left").fillna(0)

步骤:

  • 按经验/日期分组并搜索条件,如果满足,则搜索1,如果不满足,则搜索0

  • Groupby emp/date and search for criteria, 1 if met, 0 if not

删除带有0和总和的行以产生唯一值

Remove rows with 0's and cumsum to produce unique values

重新加入原始框架

在下面回答您的问题.联接之所以起作用,是因为在您运行.reset_index()之后,它会从索引中获取"emp"和"date"并将其移到列中.

To answer your question below. The join works because after you run .reset_index() that takes "emp" and "date" from the index and moves them to columns.

这篇关于跨行的python python标志事务的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆