pandas -条件降落重复项 [英] Pandas - Conditional drop duplicates

查看：66 发布时间：2020/8/1 19:47:10 python-3.x pandas duplicates

本文介绍了 pandas -条件降落重复项的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个适用于Python 3.6x的Pandas 0.19.2数据框，如下所示.我想根据条件逻辑使用相同的Id来drop_duplicates().

I have a Pandas 0.19.2 dataframe for Python 3.6x as below. I want to drop_duplicates() with the same Id based on a conditional logic.

import pandas as pd
import numpy as np
np.random.seed(1)
df = pd.DataFrame({'Id':[1,2,3,4,3,2,6,7,1,8],
              'Name':['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K'],
              'Size':np.random.rand(10),
              'Age':[19, 25, 22, 31, 43, 23, 44, 20, 51, 31]})

根据我下面描述的逻辑，实现此目标的最有效的方法(如果可能的话)是什么?

What would be the most efficient (if possible vectorised) way to achieve this based on the logic I describe below?

1)在删除重复项之前，对重复的Id项的Size求和.

1) Before dropping duplicates, sum the Size of duplicate Id entries.

2)删除相同Id记录的重复项，保留具有更大Age的记录.

2) Drop duplicates for same Id records, keeping the one that has a larger Age.

所需的输出将是:

   Age  Id Name      Size
1   25   2    B  0.812662
3   31   4    D  0.302333
4   43   3    E  0.146870
6   44   6    G  0.186260
7   20   7    H  0.345561
8   51   1    I  0.813790
9   31   8    K  0.538817

推荐答案

使用

Use GroupBy.transform for aggregated values with same size as original DataFrame with sort_values and drop_duplicates for remove dupes:

df['Size'] = df.groupby('Id')['Size'].transform('sum')
df = df.sort_values('Age').drop_duplicates('Id', keep='last').sort_index()
print (df)
   Id Name      Size  Age
1   2    B  0.812663   25
3   4    D  0.302333   31
4   3    E  0.146870   43
6   6    G  0.186260   44
7   7    H  0.345561   20
8   1    I  0.813789   51
9   8    K  0.538817   31

这篇关于 pandas -条件降落重复项的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas -条件降落重复项 [英] Pandas - Conditional drop duplicates

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas -条件降落重复项 [英] Pandas - Conditional drop duplicates

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭