计数python中值的重新出现 [英] Count Re-occurrence of a value in python
问题描述
我有一个数据集,其中包含以下内容:
I have a data set which contains something like this:
SNo Cookie
1 A
2 A
3 A
4 B
5 C
6 D
7 A
8 B
9 D
10 E
11 D
12 A
所以可以说我们有5个Cookie,'A,B,C,D,E'.现在,我要计算遇到新的cookie后是否再次发生任何cookie.例如,在上面的示例中,cookie A 在第7位和第12位再次遇到. 注意我们不会同时计数A在第二位,但是在第7位和第12位,我们在再次看到A之前已经看到了许多新的Cookie,因此我们将该实例计算在内.所以本质上我想要这样的东西:
So lets say we have 5 cookies 'A,B,C,D,E'. Now I want to count if any cookie has reoccurred after a new cookie was encountered. For example, in the above example, cookie A was encountered again at 7th place and then at 12th place also. NOTE We wouldn't count A at 2nd place as it came simultaneously, but at position 7th and 12th we had seen many new cookies before seeing A again, hence we count that instance. So essentially I want something like this:
Sno Cookie Count
1 A 2
2 B 1
3 C 0
4 D 2
5 E 0
任何人都可以在此背后给我逻辑或python代码吗?
Can anyone give me logic or python code behind this?
推荐答案
一种方法是首先消除连续的Cookies
,然后找到在使用duplicated
之前可以看到Cookie
的位置,最后groupby
cookie并获得总和:
One way to do this would be to first get rid of consecutive Cookies
, then find where the Cookie
has been seen before using duplicated
, and finally groupby
cookie and get the sum:
no_doubles = df[df.Cookie != df.Cookie.shift()]
no_doubles['dups'] = no_doubles.Cookie.duplicated()
no_doubles.groupby('Cookie').dups.sum()
这给您:
Cookie
A 2.0
B 1.0
C 0.0
D 2.0
E 0.0
Name: dups, dtype: float64
这篇关于计数python中值的重新出现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!