列出 pandas "agg","AttributeError/ValueError:函数不减少". [英] Pandas `agg` to list, "AttributeError / ValueError: Function does not reduce"
问题描述
通常,当我们使用熊猫执行groupby
操作时,我们可能希望在多个系列中应用多个功能.
Often when we perform groupby
operations using pandas we may wish to apply several functions across multiple series.
groupby.agg
似乎是执行这些分组和计算的自然方法.
groupby.agg
seems the natural way to perform these groupings and calculations.
但是,在groupby.agg
和groupby.apply
的实现方式之间似乎存在差异,因为我无法使用agg
分组到列表.元组和集合工作正常,这对我来说建议您只能通过agg
聚合为不可变的类型.通过groupby.apply
,我可以直接将一个系列汇总到一个列表中.
However, there seems to be discrepancy between how groupby.agg
and groupby.apply
are implemented, because I cannot group to a list using agg
. Tuple and set works fine, which suggests to me you can only aggregate to immutable types via agg
. Via groupby.apply
, I can aggregate one series to a list directly with no issues.
下面是一个完整的示例.函数(1),(2),(3)成功完成. (4)返回# ValueError: Function does not reduce
.
Below is a complete example. Functions (1), (2), (3) complete successfully. (4) comes back with # ValueError: Function does not reduce
.
import pandas as pd
df = pd.DataFrame([['Bob', '1/1/18', 'AType', 'blah', 'test', 'test2'],
['Bob', '1/1/18', 'AType', 'blah2', 'test', 'test3'],
['Bob', '1/1/18', 'BType', 'blah', 'test', 'test2']],
columns=['NAME', 'DATE', 'TYPE', 'VALUE A', 'VALUE B', 'VALUE C'])
def grouper(df, func):
f = {'VALUE A': lambda x: func(x), 'VALUE B': 'last', 'VALUE C': 'last'}
return df.groupby(['NAME', 'DATE', 'TYPE'])['VALUE A', 'VALUE B', 'VALUE C']\
.agg(f).reset_index()
# (1) SUCCESS
grouper(df, set)
# (2) SUCCESS
grouper(df, tuple)
# (3) SUCCESS
df.groupby(['NAME', 'DATE', 'TYPE', 'VALUE B', 'VALUE C'])['VALUE A']\
.apply(list).reset_index()
# (4) FAIL
grouper(df, list)
# AttributeError
# ValueError: Function does not reduce
推荐答案
经过大量调查,我发现这是一个bug,将在以后的熊猫版本中修复.
After much investigation, I have discovered this is a bug, which will be fixed in a future release of pandas.
0.22.x中的违规代码groupby.py ,请注意isinstance(res, list)
:
def _aggregate_series_pure_python(self, obj, func):
group_index, _, ngroups = self.group_info
counts = np.zeros(ngroups, dtype=int)
result = None
splitter = get_splitter(obj, group_index, ngroups, axis=self.axis)
for label, group in splitter:
res = func(group)
if result is None:
if (isinstance(res, (Series, Index, np.ndarray)) or
isinstance(res, list)):
raise ValueError('Function does not reduce')
result = np.empty(ngroups, dtype='O')
counts[label] = group.shape[0]
result[label] = res
result = lib.maybe_convert_objects(result, try_float=0)
return result, counts
groupby.py的主分支,省略了isinstance(res, list)
:
def _aggregate_series_pure_python(self, obj, func):
group_index, _, ngroups = self.group_info
counts = np.zeros(ngroups, dtype=int)
result = None
splitter = get_splitter(obj, group_index, ngroups, axis=self.axis)
for label, group in splitter:
res = func(group)
if result is None:
if (isinstance(res, (Series, Index, np.ndarray))):
raise ValueError('Function does not reduce')
result = np.empty(ngroups, dtype='O')
counts[label] = group.shape[0]
result[label] = res
result = lib.maybe_convert_objects(result, try_float=0)
return result, counts
这篇关于列出 pandas "agg","AttributeError/ValueError:函数不减少".的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!