在 pandas 中保留最后N个重复项 [英] Keeping the last N duplicates in pandas

查看：92 发布时间：2020/5/24 2:32:19 python pandas dataframe drop-duplicates

本文介绍了在 pandas 中保留最后N个重复项的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

给出一个数据框:

>>> import pandas as pd
>>> lol = [['a', 1, 1], ['b', 1, 2], ['c', 1, 4], ['c', 2, 9], ['b', 2, 10], ['x', 2, 5], ['d', 2, 3], ['e', 3, 5], ['d', 2, 10], ['a', 3, 5]]
>>> df = pd.DataFrame(lol)

>>> df.rename(columns={0:'value', 1:'key', 2:'something'})
  value  key  something
0     a    1          1
1     b    1          2
2     c    1          4
3     c    2          9
4     b    2         10
5     x    2          5
6     d    2          3
7     e    3          5
8     d    2         10
9     a    3          5

目标是保留key列的唯一值的最后N行.

The goal is to keep the last N rows for the unique values of the key column.

如果是N=1，我可以像这样简单地使用.drop_duplicates()函数:

If N=1, I could simply use the .drop_duplicates() function as such:

>>> df.drop_duplicates(subset='key', keep='last')
  value  key  something
2     c    1          4
8     d    2         10
9     a    3          5

如何为每个唯一值key保留最后3行?

How do I keep the last 3 rows for each unique values of key?

我可以尝试使用N=3:

>>> from itertools import chain
>>> unique_keys = {k:[] for k in df['key']}
>>> for idx, row in df.iterrows():
...     k = row['key']
...     unique_keys[k].append(list(row))
... 
>>>
>>> df = pd.DataFrame(list(chain(*[v[-3:] for k,v in unique_keys.items()])))
>>> df.rename(columns={0:'value', 1:'key', 2:'something'})
  value  key  something
0     a    1          1
1     b    1          2
2     c    1          4
3     x    2          5
4     d    2          3
5     d    2         10
6     e    3          5
7     a    3          5

但是必须有更好的方法...

But there must be a better way...

在 pandas 中保留最后N个重复项 [英] Keeping the last N duplicates in pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在 pandas 中保留最后N个重复项 [英] Keeping the last N duplicates in pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭