计算 pandas 数据框中的年数 [英] Calculating the number of years in a pandas dataframe
问题描述
我编写了一个凌乱的函数,该函数根据数据帧的长度来计算该数据帧的年数(假设该数据帧具有一年中每一天的值).
I've written a messy function which calculates the number of years in a dataframe based on it's length (assuming the dataframe has values for each day of the year).
它工作正常,但是可以使很多代码变得更聪明(但是我不确定如何...)
It works fine, but it's a lot of code that could be made much smarter (but I'm not sure how...)
这里是函数,它只能使用10年,我希望它适用于任何大小的数据集.我可以通过复制,粘贴以及将总数加起来进一步扩展它,但是必须有一种更聪明的方式来编写此代码.
Here is the function, it only goes to 10 years, I want it to work for a dataset of any size. I could extend it further by copying and pasting and adding the totals up further, but there must be a smarter way to write this code.
def numyears(x):
if len(x.index) <= 366:
return 1
elif len(x.index) <= 732:
return 2
elif len(x.index) <= 1098:
return 3
elif len(x.index) <= 1464:
return 4
elif len(x.index) <= 1830:
return 5
elif len(x.index) <= 2196:
return 6
elif len(x.index) <= 2562:
return 7
elif len(x.index) <= 2928:
return 8
elif len(x.index) <= 3294:
return 9
elif len(x.index) <= 3660:
return 10
else:
return 'ERROR'
推荐答案
仅访问year
属性然后获取唯一值的len
似乎更合理:
It seems more reasonable to just access the year
attribute and then just get the len
of the unique values:
In [2]:
s = pd.date_range(dt.datetime(1900,1,1), end=dt.datetime(2000,1,1), freq='6M')
s
Out[2]:
DatetimeIndex(['1900-01-31', '1900-07-31', '1901-01-31', '1901-07-31',
'1902-01-31', '1902-07-31', '1903-01-31', '1903-07-31',
'1904-01-31', '1904-07-31',
...
'1995-01-31', '1995-07-31', '1996-01-31', '1996-07-31',
'1997-01-31', '1997-07-31', '1998-01-31', '1998-07-31',
'1999-01-31', '1999-07-31'],
dtype='datetime64[ns]', length=200, freq='6M')
In [8]:
len(np.unique(s.year))
Out[8]:
100
通过这种方式,它可以处理不频繁的时间段,缺少的日期,与年份边界重叠的条目等.
this way it handles infrequent periods, missing days, entries that overlap year boundaries etc.
您还可以将索引转换为Series
并调用nunique
:
You can also convert the index to a Series
and call nunique
:
In [11]:
s.to_series().dt.year.nunique()
Out[11]:
100
看到您已经将日期时间作为一列,那么就可以了:
Seeing as you already have datetime as a column then just this will work:
df['date_col'].dt.year.nunique()
如有必要,您可以使用以下方法将其转换为日期时间:
If necessary you can convert to datetime using:
df['date_col'] = pd.to_datetime(df['date_col'])
更新
因此,看来您的要求是对完整年份进行计数,如果将索引设置为年和日部分,则可以在年份级别进行计数,然后过滤不大于= 365的行以提供整年:
so it seems your requirement is to count complete years, if you set the index to the year and day component then you can count at the year level and then filter the rows that are not >= 365 to give you the number of complete years:
In [34]:
df = pd.DataFrame({'date':pd.date_range(dt.datetime(1900,6,1), end=dt.datetime(1910,6,1))})
count = df.set_index([df['date'].dt.year, df['date'].dt.day]).count(level=0)
count
Out[34]:
date
date
1900 214
1901 365
1902 365
1903 365
1904 366
1905 365
1906 365
1907 365
1908 366
1909 365
1910 152
In [39]:
len(count[count >= 365].dropna())
Out[39]:
9
这篇关于计算 pandas 数据框中的年数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!