如何使用Python将仅连续值保留在Pandas数据框中 [英] How to keep only the consecutive values in a Pandas dataframe using Python

查看:102
本文介绍了如何使用Python将仅连续值保留在Pandas数据框中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的数据框:

I have a dataframe that looks like the this:

我只想保留每个组中的连续年份,如下图,其中删除了A组中的2005年和B组中的2009年和2011年.

I want to keep only the consecutive years in each group, such as the following figure where the year of 2005 in group A and year of 2009 and 2011 in group B are deleted.

我使用df['year_diff']=df.groupby(['group'])['Year'].diff()创建了一个年份差异列,然后仅保留年份差异等于1的行.

I created a column of the year difference by using df['year_diff']=df.groupby(['group'])['Year'].diff(), and then only kept the rows where the year difference was equal to 1.

但是,由于第一行的年差为NAN,因此此方法还将删除每个连续年份组中的第一行.例如,将从组2000-2005中删除2000年.有什么办法可以避免这个问题?

However, this method will also delete the first row in each consecutive year group since the year difference of the first row will be NAN. For example, the year of 2000 will be deleted from group 2000-2005. Is there a way that I can do to avoid this problem?

推荐答案

shift

像OP一样先获取年份差异.然后检查是否等于1或先前的值是1

shift

Get the year diffs as OP first did. Then check if equal to 1 or the previous value is 1

yd = df.Year.groupby(df.group).diff().eq(1)
df[yd | yd.shift(-1)]

   group  Year
0      A  2000
1      A  2001
2      A  2002
3      A  2003
5      A  2007
6      A  2008
7      A  2009
8      A  2010
9      A  2011
10     B  2005
11     B  2006
12     B  2007
15     B  2013
16     B  2014
17     B  2015
18     B  2016
19     B  2017


设置

Thx Jez


Setup

Thx jez

a = [('A',x) for x in range(2000, 2012) if x not in [2004,2006]]
b = [('B',x) for x in range(2005, 2018) if x not in [2008,2010,2012]]
df = pd.DataFrame(a + b, columns=['group','Year'])

这篇关于如何使用Python将仅连续值保留在Pandas数据框中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆