跨行的python python标志事务 [英] pandas python flag transactions across rows
问题描述
我有如下数据.我想标记交易-
I have a data as below. I would like to flag transactions -
-因此,在这种情况下,员工a
和c
的交易将突出显示.员工b
的交易不符合条件,因此不会突出显示-他没有'Car Mileage'
when a same employee has one of the ('Car Rental', 'Car Rental - Gas'
in the column expense type
) and 'Car Mileage'
on the same day - so in this case employee a
and c
's transactions would be highlighted. Employee b
's transactions won't be highlighted as they don't meet the condition - he doesn't have a 'Car Mileage'
我想要列zflag
.该列中的不同数字表示满足以上条件的实例组
i want the column zflag
. Different numbers in that column indicate group of instances when the above condition was met
d = {'emp': ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'c' ],
'date': ['1', '1', '1', '1', '2', '2', '2', '3', '3', '3', '3' ],
'usd':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 ],
'expense type':['Car Mileage', 'Car Rental', 'Car Rental - Gas', 'food', 'Car Rental', 'Car Rental - Gas', 'food', 'Car Mileage', 'Car Rental', 'food', 'wine' ],
'zflag':['1', '1', '1', ' ',' ',' ',' ','2','2',' ',' ' ]
}
df = pd.DataFrame(data=d)
df
Out[253]:
date emp expense type usd zflag
0 1 a Car Mileage 1 1
1 1 a Car Rental 2 1
2 1 a Car Rental - Gas 3 1
3 1 a food 4
4 2 b Car Rental 5
5 2 b Car Rental - Gas 6
6 2 b food 7
7 3 c Car Mileage 8 2
8 3 c Car Rental 9 2
9 3 c food 10
10 3 c wine 11
如果能获得有关要使用的功能的指针,我将不胜感激.我正在考虑使用groupby ...但不确定
I would appreciate if i could get pointers regarding functions to use. I am thinking of using groupby...but not sure
我知道date
+ emp
将是我的主键
I understand that date
+emp
will be my primary key
推荐答案
这里是一种方法.它不是最干净的,但是您所描述的是非常具体的.其中一些功能可以通过功能进行简化.
Here is an approach. It's not the cleanest but what you're describing is quite specific. Some of this might be able to be simplified with a function.
temp_df = df.groupby(["emp", "date"], axis=0)["expense type"].apply(lambda x: 1 if "Car Mileage" in x.values and any([k in x.values for k in ["Car Rental", "Car Rental - Gas"]]) else 0).rename("zzflag")
temp_df = temp_df.loc[temp_df!=0,:].cumsum()
final_df = pd.merge(df, temp_df.reset_index(), how="left").fillna(0)
步骤:
-
按经验/日期分组并搜索条件,如果满足,则搜索1,如果不满足,则搜索0
Groupby emp/date and search for criteria, 1 if met, 0 if not
删除带有0和总和的行以产生唯一值
Remove rows with 0's and cumsum to produce unique values
重新加入原始框架
在下面回答您的问题.联接之所以起作用,是因为在您运行.reset_index()
之后,它会从索引中获取"emp"和"date"并将其移到列中.
To answer your question below. The join works because after you run .reset_index()
that takes "emp" and "date" from the index and moves them to columns.
这篇关于跨行的python python标志事务的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!