python pandas .apply() 函数索引错误 [英] python pandas .apply() function index error

查看:74
本文介绍了python pandas .apply() 函数索引错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据帧:

                              P     N  ID  Year  Month
TS                                                    
2016-06-26 19:30:00  263.600006   5.4   5  2016      6
2016-06-26 20:00:00  404.700012   5.6   5  2016      6
2016-06-26 21:10:00  438.600006   6.0   5  2016      6
2016-06-26 21:20:00  218.600006   5.6   5  2016      6
2016-07-02 16:10:00  285.300049  15.1   5  2016      7

我正在尝试根据 YearMonth 列的值添加一个新列,如下所示

I'm trying to add a new column based on the values of columns Year and Month something like the following

def exp_records(row):
    return calendar.monthrange(row['Year'], row['Month'])[1]
df['exp_counts'] = df.apply(exp_records, axis=1)

但我收到以下错误:

TypeError: ('integer argument expected, got float', 'occurred at index 2016-06-26 19:30:00')

TypeError: ('integer argument expected, got float', 'occurred at index 2016-06-26 19:30:00')

但是如果我 reset_index() 为整数,那么上面的 .apply() 工作正常.这是预期的行为吗?

If I however reset_index() to integer then the above .apply() works fine. Is this the expected behavior?

我在 Python 3.4 中使用 pandas 0.19.1

I'm using using pandas 0.19.1 with Python 3.4

重新创建 DataFrame 的代码:

Code to recreate the DataFrame:

s = '''
TS,P,N,ID,Year,Month
2016-06-26 19:30:00,263.600006,5.4,5,2016,6
2016-06-26 20:00:00,404.700012,5.6,5,2016,6
2016-06-26 21:10:00,438.600006,6.0,5,2016,6
2016-06-26 21:20:00,218.600006,5.6,5,2016,6
2016-07-02 16:10:00,285.300049,15.1,5,2016,7
'''

df = pd.read_csv(pd.compat.StringIO(s), index_col=0, parse_dates=True)

推荐答案

解决方案

使用df[['Year', 'Month']]申请:

df['exp_counts'] = df[['Year', 'Month']].apply(exp_records, axis=1)

结果:

                              P     N  ID  Year  Month  exp_counts
TS                                                                
2016-06-26 19:30:00  263.600006   5.4   5  2016      6          30
2016-06-26 20:00:00  404.700012   5.6   5  2016      6          30
2016-06-26 21:10:00  438.600006   6.0   5  2016      6          30
2016-06-26 21:20:00  218.600006   5.6   5  2016      6          30
2016-07-02 16:10:00  285.300049  15.1   5  2016      7          31

原因

虽然您的 YearMonth 列是整数:

df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 5 entries, 2016-06-26 19:30:00 to 2016-07-02 16:10:00
Data columns (total 5 columns):
P        5 non-null float64
N        5 non-null float64
ID       5 non-null int64
Year     5 non-null int64
Month    5 non-null int64
dtypes: float64(2), int64(3)
memory usage: 240.0 bytes

您按行访问它们,这使它们浮动:

You access them by row, which makes them floats:

df.T.info()

<class 'pandas.core.frame.DataFrame'>
Index: 5 entries, P to Month
Data columns (total 5 columns):
2016-06-26 19:30:00    5 non-null float64
2016-06-26 20:00:00    5 non-null float64
2016-06-26 21:10:00    5 non-null float64
2016-06-26 21:20:00    5 non-null float64
2016-07-02 16:10:00    5 non-null float64
dtypes: float64(5)
memory usage: 240.0+ bytes

由于 df.apply(exp_records, axis=1) 是按行进行的,因此您基本上可以转换为行.

Since df.apply(exp_records, axis=1) goes by row, you essentially convert to rows.

这是您在 exp_records 中为 row 得到的:

This is what you get in exp_records for row:

P         263.600006
N           5.400000
ID          5.000000
Year     2016.000000
Month       6.000000
Name: 2016-06-26T19:30:00.000000000, dtype: float64

仅使用 YearMonth 列创建数据框确实会导致转换为浮点数,因为两列都是整数:

Creating a dataframe with the columns Year and Month only, does cause a converting to float because both columns a integers:

df[['Year', 'Month']].T.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2 entries, Year to Month
Data columns (total 5 columns):
2016-06-26 19:30:00    2 non-null int64
2016-06-26 20:00:00    2 non-null int64
2016-06-26 21:10:00    2 non-null int64
2016-06-26 21:20:00    2 non-null int64
2016-07-02 16:10:00    2 non-null int64
dtypes: int64(5)
memory usage: 96.0+ bytes

这篇关于python pandas .apply() 函数索引错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆