通过GroupBy获取Pandas的平均值 - 获取DataError:没有数字类型进行聚合 - [英] Getting Average of Pandas with GroupBy- Getting DataError: No numeric types to aggregate -
问题描述
我知道有很多关于这方面的问题,例如通过熊猫获得每日平均值
和如何获得使用groupby的熊猫的月平均值一个>但我得到一个奇怪的错误。简单的数据集,包含一个索引列(类型时间戳)和一个值列。
想获得数据的月平均值。
In [76]:df.head()
Out [76]:
A
2008-01-02 1
2008-01-03 2
2008-01-04 3
2008-01-07 4
2008-01-08 5
但是,当我groupby时,我只得到索引的组而不是值
In [74]:df.head()。groupby(lambda x:x.month).groups
Out [74] :
{1:[时间戳('2008-01-02 00:00:00'),
时间戳('2008-01-03 00:00:00'),
时间戳('2008-01-04 00:00:00'),
时间戳('2008-01-07 00:00:00'),
时间戳('2008-01-08 00:00 :)')]}
尝试使用方法()会导致错误:
已经尝试过 df.head()。resample(M,how ='mean')
和 df.head()。groupby(lambda x:x.month).mean()
并且获取错误: DataError:没有数字类型t在<75>中:df.resample(M,how ='mean ')
-------------------------------------------- -------------------------------
DataError Traceback(最近的最后一次调用)
< ipython -input-75-79dc1a060ba4> in< module>()
----> 1 df.resample(M,how ='mean')
/usr/local/lib/python2.7/site-packages/pandas/core/generic.pyc resample(self,规则,如何,轴,fill_method,closed,label,convention,kind,loffset,limit,base)
2878 fill_method = fill_method,convention = convention,
2879 limit = limit,base = base)
- > 2880 return sampler.resample(self).__ finalize __(self)
2881
2882 def first(self,offset):
/usr/local/lib/python2.7/ site-packages / pandas / tseries / resample.pyc in resample(self,obj)
82
83如果isinstance(ax,DatetimeIndex):
---> 84 rs = self._resample_timestamps()
85 elif isinstance(ax,PeriodIndex):
86 offset = to_offset(self.freq)
/ usr / local / lib / python2 .7 / site-packages / pandas / tseries / resample.pyc in _resample_timestamps(self)
286#不规则数据,必须使用groupby
287 grouped = obj.groupby(grouper,axis = self.axis )
- > 288结果= grouped.aggregate(self._agg_method)
289
290如果self.fill_method不是无:
/usr/local/lib/python2.7/site- (自我,arg,* args,** kwargs)
2436 def aggregate(self,arg,* args,** kwargs):
2437 if isinstance (arg,compat.string_types):
- > 2438 return getattr(self,arg)(* args,** kwargs)
2439
2440 result = OrderedDict()
/usr/local/lib/python2.7/ site-packages / pandas / core / groupby.pyc in mean(self)
664
665 try:
- > 666 return self._cython_agg_general('mean')
667除GroupByError外:
668提升
/usr/local/lib/python2.7/site-packages/pandas/core/groupby.pyc in _cython_agg_general(self,how,
2356
2357 def _cython_agg_general(self,how,numeric_only = True):
- > 2358 new_items,new_blocks = self._cython_agg_blocks(how,numeric_only = numeric_only)
2359在_cython_agg_blocks中返回self._wrap_agged_blocks(new_items,new_blocks)
2360
/usr/local/lib/python2.7/site-packages/pandas/core/groupby.pyc(self, how,numeric_only)
2406
2407 if len(new_blocks)== 0:
- > 2408引发DataError('没有数字类型聚合')
2409
2410返回data.items,new_blocks
DataError:没有数字类型来聚合
是的,您应该尝试强制 A
到数值类似于 df ['A'] = df ['A']。astype(int)
。可能值得检查是否有初始数据读入中的任何内容导致它成为对象而不是数字。
I know that there are numerous questions about this, like Getting daily averages with pandas and How get monthly mean in pandas using groupby but I'm getting a weird error.
Simple data set, with one index column (type timestamp) and one value column. Would like to get the monthly mean of the data.
In [76]: df.head()
Out[76]:
A
2008-01-02 1
2008-01-03 2
2008-01-04 3
2008-01-07 4
2008-01-08 5
However, when I groupby, I get just the groups of the index and not of the value
In [74]: df.head().groupby(lambda x: x.month).groups
Out[74]:
{1: [Timestamp('2008-01-02 00:00:00'),
Timestamp('2008-01-03 00:00:00'),
Timestamp('2008-01-04 00:00:00'),
Timestamp('2008-01-07 00:00:00'),
Timestamp('2008-01-08 00:00:00')]}
Attempts to take means() result in an error:
Have tried both df.head().resample("M", how='mean')
and df.head().groupby(lambda x: x.month).mean()
and gets the error: DataError: No numeric types to aggregate
In [75]: df.resample("M", how='mean')
---------------------------------------------------------------------------
DataError Traceback (most recent call last)
<ipython-input-75-79dc1a060ba4> in <module>()
----> 1 df.resample("M", how='mean')
/usr/local/lib/python2.7/site-packages/pandas/core/generic.pyc in resample(self, rule, how, axis, fill_method, closed, label, convention, kind, loffset, limit, base)
2878 fill_method=fill_method, convention=convention,
2879 limit=limit, base=base)
-> 2880 return sampler.resample(self).__finalize__(self)
2881
2882 def first(self, offset):
/usr/local/lib/python2.7/site-packages/pandas/tseries/resample.pyc in resample(self, obj)
82
83 if isinstance(ax, DatetimeIndex):
---> 84 rs = self._resample_timestamps()
85 elif isinstance(ax, PeriodIndex):
86 offset = to_offset(self.freq)
/usr/local/lib/python2.7/site-packages/pandas/tseries/resample.pyc in _resample_timestamps(self)
286 # Irregular data, have to use groupby
287 grouped = obj.groupby(grouper, axis=self.axis)
--> 288 result = grouped.aggregate(self._agg_method)
289
290 if self.fill_method is not None:
/usr/local/lib/python2.7/site-packages/pandas/core/groupby.pyc in aggregate(self, arg, *args, **kwargs)
2436 def aggregate(self, arg, *args, **kwargs):
2437 if isinstance(arg, compat.string_types):
-> 2438 return getattr(self, arg)(*args, **kwargs)
2439
2440 result = OrderedDict()
/usr/local/lib/python2.7/site-packages/pandas/core/groupby.pyc in mean(self)
664 """
665 try:
--> 666 return self._cython_agg_general('mean')
667 except GroupByError:
668 raise
/usr/local/lib/python2.7/site-packages/pandas/core/groupby.pyc in _cython_agg_general(self, how, numeric_only)
2356
2357 def _cython_agg_general(self, how, numeric_only=True):
-> 2358 new_items, new_blocks = self._cython_agg_blocks(how, numeric_only=numeric_only)
2359 return self._wrap_agged_blocks(new_items, new_blocks)
2360
/usr/local/lib/python2.7/site-packages/pandas/core/groupby.pyc in _cython_agg_blocks(self, how, numeric_only)
2406
2407 if len(new_blocks) == 0:
-> 2408 raise DataError('No numeric types to aggregate')
2409
2410 return data.items, new_blocks
DataError: No numeric types to aggregate
Yeah, you should try coercing A
to numeric with something like df['A'] = df['A'].astype(int)
. Might be worth checking if there's anything in the initial data read-in that caused it to be object instead of numeric as well.
这篇关于通过GroupBy获取Pandas的平均值 - 获取DataError:没有数字类型进行聚合 - 的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!