计算python中某个值相对于另一个值的重复出现次数 [英] Count re-occurrence of a value in python aggregated with respect to another value

查看：128 发布时间：2020/5/3 9:00:59 python pandas loops count logic

本文介绍了计算python中某个值相对于另一个值的重复出现次数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

此问题是我在此处:

现在我有类似这样的数据:

Now I have data something like this:

Sno   User  Cookie
 1     1       A
 2     1       A
 3     1       A
 4     1       B
 5     1       C
 6     1       D
 7     1       A
 8     1       B
 9     1       D
 10    1       E
 11    1       D
 12    1       A
 13    2       F
 14    2       G
 15    2       F
 16    2       G
 17    2       H
 18    2       H

所以可以说我们有 5个用户1的Cookie 'A，B，C，D，E'.现在，我要计算遇到新的cookie后是否再次发生了任何cookie.例如，在上面的示例中，在第7位，然后在第12位，再次遇到了CookieA.注意我们不会同时计数A在第二位，但是在第7位和第12位，我们在再次看到A之前已经看到了许多新的cookie，因此我们计算了该实例.因此，如果我运行上一篇文章中提到的代码，这将是我所得到的:

So lets say we have 5 cookies for user 1 'A,B,C,D,E'. Now I want to count if any cookie has reoccurred after a new cookie was encountered. For example, in the above example, cookie A was encountered again at 7th place and then at 12th place also. NOTE We wouldn't count A at 2nd place as it came simultaneously, but at position 7th and 12th we had seen many new cookies before seeing A again, hence we count that instance. So this is what I will get if I run code mentioned in my previous post:

对于用户1

Sno Cookie  Count
 1    A     2
 2    B     1
 3    C     0
 4    D     2
 5    E     0

对于用户2

Sno Cookie  Count
 6    F     1
 7    G     1
 8    H     0

现在是棘手的部分，现在我们可以计算出，对于用户1，重复出现了三个Cookie"A，B和D".同样，对于用户2，再次出现"F和G".我想像这样汇总这些结果:

Now comes the tricky part, now we know by the count, that for user 1, three cookies "A, B and D" re-occurred. Similarly for User 2 "F and G" reoccurred. I want to aggregate these results like this:

Sno User Reoccurred_Instances
 1   1    3
 2   2    2

有没有更简单的方法而无需使用循环来获得此结果.

Is there any easier way without using a loop to get this result.

推荐答案

遵循与我对上一个问题的回答相同的第一步，以消除连续的Cookie值并查找重复项:

Following the same first steps as I took in my answer to your previous question, to get rid of consecutive Cookie values and find the duplicates:

no_doubles = df[df.Cookie != df.Cookie.shift()]

no_doubles['dups'] = no_doubles.Cookie.duplicated()

然后使用groupby对确实重复的数据子集(no_doubles[no_doubles['dups']])进行User分组，并使用nunique为每个用户找到唯一的Cookies数:

Then use a groupby to group by User on the subset of data that are indeed duplicated (no_doubles[no_doubles['dups']]), and find the number of unique Cookies for each user using nunique:

no_doubles[no_doubles['dups']].groupby('User')['Cookie'].nunique().reset_index()

这将返回:

   User  Cookie
0     1       3
1     2       2

您可以根据需要重命名列

You can rename the columns as desired

要处理不同的情况，只需添加此逻辑即可.例如，考虑以下在User数字3中没有重复的数据帧:

To deal with different cases, you can just add to this logic. For example, considering the following dataframe with no repeats in User number 3:

Sno   User  Cookie
 1     1       A
 2     1       A
 3     1       A
 4     1       B
 5     1       C
 6     1       D
 7     1       A
 8     1       B
 9     1       D
 10    1       E
 11    1       D
 12    1       A
 13    2       F
 14    2       G
 15    2       F
 16    2       G
 17    2       H
 18    2       H
 18    3       H
 18    3       I
 18    3       J

您可以这样做:

no_doubles = df[(df.Cookie != df.Cookie.shift()) | (df.User != df.User.shift())]

no_doubles['dups'] = no_doubles.duplicated(['Cookie', 'User'])

no_doubles.groupby('User').apply(lambda x: x[x.dups]['Cookie'].nunique()).to_frame('Reoccurred_Instances')

获得:

      Reoccurred_Instances
User                      
1                        3
2                        2
3                        0

这篇关于计算python中某个值相对于另一个值的重复出现次数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

计算python中某个值相对于另一个值的重复出现次数 [英] Count re-occurrence of a value in python aggregated with respect to another value

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

计算python中某个值相对于另一个值的重复出现次数 [英] Count re-occurrence of a value in python aggregated with respect to another value

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭