Pandas中的“日期的最大值/最小值"列,列中包含nan值 [英] Max / Min of date column in Pandas, columns include nan values

查看:246
本文介绍了Pandas中的“日期的最大值/最小值"列,列中包含nan值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在pandas数据框中创建一个新列,其中最大(或最小)日期来自其他两个日期列.但是,当这些列中的任何一列中的任何地方都存在NAN时,整个min/max列将变为NAN.是什么赋予了?当使用数字列时,这可以很好地工作...但是对于日期,新列是所有NAN.下面是一些示例代码来说明此问题:

df = pd.DataFrame(data=[[np.nan,date(2000,11,1)], 
                        [date(2000,12,1), date(2000,9,1)],
                        [date(2000,4,1),np.nan],
                        [date(2000,12,2),np.nan]], columns=['col1','col2'])

df['col3'] = df[['col1','col2']].max(axis=1)

我知道可以使用loc和<,>,isull等组合来完成.但是如何使其与常规的max/min函数一起使用?

解决方案

您正在将date对象存储在您的列中,如果转换为datetime,那么它将按预期工作:

In[10]:
df['col1'] = pd.to_datetime(df['col1'])
df['col2'] = pd.to_datetime(df['col2'])
df

Out[10]: 
        col1       col2  col3
0        NaT 2000-11-01   NaN
1 2000-12-01 2000-09-01   NaN
2 2000-04-01        NaT   NaN
3 2000-12-02        NaT   NaN

In[11]:
df['col3'] = df[['col1','col2']].max(axis=1)
df

Out[11]: 
        col1       col2       col3
0        NaT 2000-11-01 2000-11-01
1 2000-12-01 2000-09-01 2000-12-01
2 2000-04-01        NaT 2000-04-01
3 2000-12-02        NaT 2000-12-02

如果您只是这样做:

df['col3'] = df['col1'].max()

这引起一个TypeError: '>=' not supported between instances of 'float' and 'datetime.date'

NaN值使dtype提升为float,因此返回NaN.如果没有缺失值,那么它将按预期工作,如果缺失值,则应将dtype转换为datetime,以便将缺失值转换为NaT,以便max正常工作

I'm trying to create a new column in a pandas dataframe with the maximum (or minimum) date from two other date columns. But, when there is a NAN anywhere in either of those columns, the whole min/max column becomes a NAN. What gives? When using number columns this works fine... but with dates, the new column is all NANs. Here's some sample code to illustrate the problem:

df = pd.DataFrame(data=[[np.nan,date(2000,11,1)], 
                        [date(2000,12,1), date(2000,9,1)],
                        [date(2000,4,1),np.nan],
                        [date(2000,12,2),np.nan]], columns=['col1','col2'])

df['col3'] = df[['col1','col2']].max(axis=1)

I know it can be done with loc and combination of <, >, isnull and so on. But how to make it work with regular max/min functions?

解决方案

You're storing date objects in your columns, if you convert to datetime then it works as expected:

In[10]:
df['col1'] = pd.to_datetime(df['col1'])
df['col2'] = pd.to_datetime(df['col2'])
df

Out[10]: 
        col1       col2  col3
0        NaT 2000-11-01   NaN
1 2000-12-01 2000-09-01   NaN
2 2000-04-01        NaT   NaN
3 2000-12-02        NaT   NaN

In[11]:
df['col3'] = df[['col1','col2']].max(axis=1)
df

Out[11]: 
        col1       col2       col3
0        NaT 2000-11-01 2000-11-01
1 2000-12-01 2000-09-01 2000-12-01
2 2000-04-01        NaT 2000-04-01
3 2000-12-02        NaT 2000-12-02

If you simply did:

df['col3'] = df['col1'].max()

this raises a TypeError: '>=' not supported between instances of 'float' and 'datetime.date'

The NaN values cause the dtype to be promoted to float so NaN gets returned. If you had no missing values then it would work as expected, if you have missing values then you should convert the dtype to datetime so that the missing values are converted to NaT so that max works correctly

这篇关于Pandas中的“日期的最大值/最小值"列,列中包含nan值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆