python pandas .apply() 函数索引错误 [英] python pandas .apply() function index error
问题描述
我有以下数据帧:
P N ID Year Month
TS
2016-06-26 19:30:00 263.600006 5.4 5 2016 6
2016-06-26 20:00:00 404.700012 5.6 5 2016 6
2016-06-26 21:10:00 438.600006 6.0 5 2016 6
2016-06-26 21:20:00 218.600006 5.6 5 2016 6
2016-07-02 16:10:00 285.300049 15.1 5 2016 7
我正在尝试根据 Year
和 Month
列的值添加一个新列,如下所示
I'm trying to add a new column based on the values of columns Year
and Month
something like the following
def exp_records(row):
return calendar.monthrange(row['Year'], row['Month'])[1]
df['exp_counts'] = df.apply(exp_records, axis=1)
但我收到以下错误:
TypeError: ('integer argument expected, got float', 'occurred at index 2016-06-26 19:30:00')
TypeError: ('integer argument expected, got float', 'occurred at index 2016-06-26 19:30:00')
但是如果我 reset_index()
为整数,那么上面的 .apply()
工作正常.这是预期的行为吗?
If I however reset_index()
to integer then the above .apply()
works fine. Is this the expected behavior?
我在 Python 3.4 中使用 pandas 0.19.1
I'm using using pandas 0.19.1 with Python 3.4
重新创建 DataFrame 的代码:
Code to recreate the DataFrame:
s = '''
TS,P,N,ID,Year,Month
2016-06-26 19:30:00,263.600006,5.4,5,2016,6
2016-06-26 20:00:00,404.700012,5.6,5,2016,6
2016-06-26 21:10:00,438.600006,6.0,5,2016,6
2016-06-26 21:20:00,218.600006,5.6,5,2016,6
2016-07-02 16:10:00,285.300049,15.1,5,2016,7
'''
df = pd.read_csv(pd.compat.StringIO(s), index_col=0, parse_dates=True)
推荐答案
解决方案
使用df[['Year', 'Month']]
申请:
df['exp_counts'] = df[['Year', 'Month']].apply(exp_records, axis=1)
结果:
P N ID Year Month exp_counts
TS
2016-06-26 19:30:00 263.600006 5.4 5 2016 6 30
2016-06-26 20:00:00 404.700012 5.6 5 2016 6 30
2016-06-26 21:10:00 438.600006 6.0 5 2016 6 30
2016-06-26 21:20:00 218.600006 5.6 5 2016 6 30
2016-07-02 16:10:00 285.300049 15.1 5 2016 7 31
原因
虽然您的 Year
和 Month
列是整数:
df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 5 entries, 2016-06-26 19:30:00 to 2016-07-02 16:10:00
Data columns (total 5 columns):
P 5 non-null float64
N 5 non-null float64
ID 5 non-null int64
Year 5 non-null int64
Month 5 non-null int64
dtypes: float64(2), int64(3)
memory usage: 240.0 bytes
您按行访问它们,这使它们浮动:
You access them by row, which makes them floats:
df.T.info()
<class 'pandas.core.frame.DataFrame'>
Index: 5 entries, P to Month
Data columns (total 5 columns):
2016-06-26 19:30:00 5 non-null float64
2016-06-26 20:00:00 5 non-null float64
2016-06-26 21:10:00 5 non-null float64
2016-06-26 21:20:00 5 non-null float64
2016-07-02 16:10:00 5 non-null float64
dtypes: float64(5)
memory usage: 240.0+ bytes
由于 df.apply(exp_records, axis=1)
是按行进行的,因此您基本上可以转换为行.
Since df.apply(exp_records, axis=1)
goes by row, you essentially convert to rows.
这是您在 exp_records
中为 row
得到的:
This is what you get in exp_records
for row
:
P 263.600006
N 5.400000
ID 5.000000
Year 2016.000000
Month 6.000000
Name: 2016-06-26T19:30:00.000000000, dtype: float64
仅使用 Year
和 Month
列创建数据框确实会导致转换为浮点数,因为两列都是整数:
Creating a dataframe with the columns Year
and Month
only, does cause a converting to float because both columns a integers:
df[['Year', 'Month']].T.info()
<class 'pandas.core.frame.DataFrame'>
Index: 2 entries, Year to Month
Data columns (total 5 columns):
2016-06-26 19:30:00 2 non-null int64
2016-06-26 20:00:00 2 non-null int64
2016-06-26 21:10:00 2 non-null int64
2016-06-26 21:20:00 2 non-null int64
2016-07-02 16:10:00 2 non-null int64
dtypes: int64(5)
memory usage: 96.0+ bytes
这篇关于python pandas .apply() 函数索引错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!