计算 pandas 数据框中的年数 [英] Calculating the number of years in a pandas dataframe

查看:59
本文介绍了计算 pandas 数据框中的年数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了一个凌乱的函数,该函数根据数据帧的长度来计算该数据帧的年数(假设该数据帧具有一年中每一天的值).

I've written a messy function which calculates the number of years in a dataframe based on it's length (assuming the dataframe has values for each day of the year).

它工作正常,但是可以使很多代码变得更聪明(但是我不确定如何...)

It works fine, but it's a lot of code that could be made much smarter (but I'm not sure how...)

这里是函数,它只能使用10年,我希望它适用于任何大小的数据集.我可以通过复制,粘贴以及将总数加起来进一步扩展它,但是必须有一种更聪明的方式来编写此代码.

Here is the function, it only goes to 10 years, I want it to work for a dataset of any size. I could extend it further by copying and pasting and adding the totals up further, but there must be a smarter way to write this code.

def numyears(x):
    if len(x.index) <= 366:
        return 1
    elif len(x.index) <= 732:
        return 2
    elif len(x.index) <= 1098:
        return 3
    elif len(x.index) <= 1464:
        return 4
    elif len(x.index) <= 1830:
        return 5
    elif len(x.index) <= 2196:
        return 6
    elif len(x.index) <= 2562:
        return 7
    elif len(x.index) <= 2928:
        return 8
    elif len(x.index) <= 3294:
        return 9
    elif len(x.index) <= 3660:
        return 10
    else: 
        return 'ERROR'

推荐答案

仅访问year属性然后获取唯一值的len似乎更合理:

It seems more reasonable to just access the year attribute and then just get the len of the unique values:

In [2]:
s = pd.date_range(dt.datetime(1900,1,1), end=dt.datetime(2000,1,1), freq='6M')
s

Out[2]:
DatetimeIndex(['1900-01-31', '1900-07-31', '1901-01-31', '1901-07-31',
               '1902-01-31', '1902-07-31', '1903-01-31', '1903-07-31',
               '1904-01-31', '1904-07-31',
               ...
               '1995-01-31', '1995-07-31', '1996-01-31', '1996-07-31',
               '1997-01-31', '1997-07-31', '1998-01-31', '1998-07-31',
               '1999-01-31', '1999-07-31'],
              dtype='datetime64[ns]', length=200, freq='6M')

In [8]:
len(np.unique(s.year))

Out[8]:
100

通过这种方式,它可以处理不频繁的时间段,缺少的日期,与年份边界重叠的条目等.

this way it handles infrequent periods, missing days, entries that overlap year boundaries etc.

您还可以将索引转换为Series并调用nunique:

You can also convert the index to a Series and call nunique:

In [11]:
s.to_series().dt.year.nunique()

Out[11]:
100

看到您已经将日期时间作为一列,那么就可以了:

Seeing as you already have datetime as a column then just this will work:

df['date_col'].dt.year.nunique()

如有必要,您可以使用以下方法将其转换为日期时间:

If necessary you can convert to datetime using:

df['date_col'] = pd.to_datetime(df['date_col'])

更新

因此,看来您的要求是对完整年份进行计数,如果将索引设置为年和日部分,则可以在年份级别进行计数,然后过滤不大于= 365的行以提供整年:

so it seems your requirement is to count complete years, if you set the index to the year and day component then you can count at the year level and then filter the rows that are not >= 365 to give you the number of complete years:

In [34]:
df = pd.DataFrame({'date':pd.date_range(dt.datetime(1900,6,1), end=dt.datetime(1910,6,1))})
count = df.set_index([df['date'].dt.year, df['date'].dt.day]).count(level=0)
count

Out[34]:
      date
date      
1900   214
1901   365
1902   365
1903   365
1904   366
1905   365
1906   365
1907   365
1908   366
1909   365
1910   152

In [39]:
len(count[count >= 365].dropna())

Out[39]:
9

这篇关于计算 pandas 数据框中的年数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆