在Pandas DataFrame中计算重复值 [英] Counting duplicate values in Pandas DataFrame

查看：372 发布时间：2020/5/24 0:12:17 python pandas count duplicates

本文介绍了在Pandas DataFrame中计算重复值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

必须有一种简单的方法来执行此操作，但是我无法找到针对SO的优雅解决方案，也无法自己解决.

There must be an easy way to do this, but I was unable to find an elegant solution for on SO or work it out by myself.

我正在尝试根据DataFrame中的列集来计算重复值的数量.

I'm trying to count the number of duplicate values based on set of columns in a DataFrame.

示例:

print df

    Month   LSOA code   Longitude   Latitude    Crime type
0   2015-01 E01000916   -0.106453   51.518207   Bicycle theft
1   2015-01 E01000914   -0.111497   51.518226   Burglary
2   2015-01 E01000914   -0.111497   51.518226   Burglary
3   2015-01 E01000914   -0.111497   51.518226   Other theft
4   2015-01 E01000914   -0.113767   51.517372   Theft from the person

我的解决方法:

counts = dict()
for i, row in df.iterrows():
    key = (
            row['Longitude'],
            row['Latitude'],
            row['Crime type']
        )

    if counts.has_key(key):
        counts[key] = counts[key] + 1
    else:
        counts[key] = 1

我得到了计数:

{(-0.11376700000000001, 51.517371999999995, 'Theft from the person'): 1,
 (-0.111497, 51.518226, 'Burglary'): 2,
 (-0.111497, 51.518226, 'Other theft'): 1,
 (-0.10645299999999999, 51.518207000000004, 'Bicycle theft'): 1}

除了可以改进此代码(随意评论如何)的事实之外，通过Pandas进行编码的方法是什么?

Aside from the fact this code could be improved as well (feel free to comment how), what would be the way to do it through Pandas?

对于那些感兴趣的人，我正在研究 https://data.police.uk/中的数据集

For those interested I'm working on a dataset from https://data.police.uk/

推荐答案

您可以将groupby与功能

You can use groupby with function size. Then I reset index with rename column 0 to count.

print df
  Month LSOA       code  Longitude   Latitude             Crime type
0    2015-01  E01000916  -0.106453  51.518207          Bicycle theft
1    2015-01  E01000914  -0.111497  51.518226               Burglary
2    2015-01  E01000914  -0.111497  51.518226               Burglary
3    2015-01  E01000914  -0.111497  51.518226            Other theft
4    2015-01  E01000914  -0.113767  51.517372  Theft from the person

df = df.groupby(['Longitude', 'Latitude', 'Crime type']).size().reset_index(name='count')
print df
   Longitude   Latitude             Crime type  count
0  -0.113767  51.517372  Theft from the person      1
1  -0.111497  51.518226               Burglary      2
2  -0.111497  51.518226            Other theft      1
3  -0.106453  51.518207          Bicycle theft      1

print df['count']
0    1
1    2
2    1
3    1
Name: count, dtype: int64

这篇关于在Pandas DataFrame中计算重复值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在Pandas DataFrame中计算重复值 [英] Counting duplicate values in Pandas DataFrame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在Pandas DataFrame中计算重复值 [英] Counting duplicate values in Pandas DataFrame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭