使用.groupby()的 pandas 时间序列的平均值 [英] Mean of Pandas TimeSeries using .groupby()

查看:308
本文介绍了使用.groupby()的 pandas 时间序列的平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从行为实验中获得了一些连续的x/y坐标,我希望在使用熊猫的小组中进行平均.

我在这里使用数据的子集.

 data
Out[11]: 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2036 entries, 0 to 1623
Data columns (total 9 columns):
id               2036  non-null values
subject          2036  non-null values
code             2036  non-null values
acc              2036  non-null values
nx               2036  non-null values
ny               2036  non-null values
rx               2036  non-null values
ry               2036  non-null values
reaction_time    2036  non-null values
dtypes: bool(1), int64(3), object(5)
 

nxny包含一系列TimeSeries对象,所有这些对象都具有相同的索引.

 data.nx.iloc[0]
Out[16]: 
0     0
1     0
2     0
3     0
4     0
5     0
6     0
7     0
8     0
9     0
10    0
11    0
12    0
13    0
14    0
...
86     1.019901
87     1.010000
88     1.010000
89     1.005921
90     1.000000
91     1.000000
92     1.000000
93     1.000000
94     1.000000
95     1.000000
96     1.000000
97     1.000000
98     1.000000
99     1.000000
100    1.000000
Length: 101, dtype: float64
 

这些TimeSeries列可以使用data.nx.mean()正常地平均,并表现出预期的效果,但是在尝试对数据进行分组时遇到了麻烦.

 grouped = data.groupby(['code', 'acc'])
means = grouped.mean()
print means
                       id          subject  reaction_time
code   acc                                               
group1 False  1570.866667  47474992.333333    1506.000000
       True   1337.076152  46022403.623246    1322.116232
group2 False  1338.180180  48730402.045045    1289.112613
       True   1382.631757  42713592.628378    1294.952703
group3 False  1488.587156  43202477.623853    1349.568807
       True   1310.415233  47054310.498771    1341.837838
group4 False  1339.682540  52530349.936508    1540.714286
       True   1343.261176  44606616.407059    1362.174118
 

奇怪的是,我可以强迫他们平均化TimeSeries数据,并且可能不得不依靠这种方式进行黑客攻击,就像这样:

 for name, group in grouped:
     print group.nx.mean()

0     0.000000
1     0.000000
2     0.000000
3     0.000000
4     0.000000
5     0.000667
6     0.000683
7     0.001952
8     0.002000
9     0.002000

{etc, 101 values for 6 groups}
 

最后,如果我尝试强制GroupBy对象对它们求平均值,则会得到以下结果:

 grouped.nx.mean()
---------------------------------------------------------------------------
DataError                                 Traceback (most recent call last)
<ipython-input-25-0b536a966e02> in <module>()
----> 1 grouped.nx.mean()

/usr/local/lib/python2.7/dist-packages/pandas-0.12.0-py2.7-linux-i686.egg/pandas/core/groupby.pyc in mean(self)
    357         """
    358         try:
--> 359             return self._cython_agg_general('mean')
    360         except GroupByError:
    361             raise

/usr/local/lib/python2.7/dist-packages/pandas-0.12.0-py2.7-linux-i686.egg/pandas/core/groupby.pyc in _cython_agg_general(self, how, numeric_only)
    462 
    463         if len(output) == 0:
--> 464             raise DataError('No numeric types to aggregate')
    465 
    466         return self._wrap_aggregated_output(output, names)

DataError: No numeric types to aggregate
 

有人有什么想法吗?

解决方案

每个条目本身都是一个Series的Series不是惯用的.我认为没有要聚合的数字类型"告诉您熊猫正在尝试获取未定义的系列列表的平均值(而不是它们包含的数字数据的平均值).

您应该整理数据,以便nx和ny包含实际数字.将nx,ny(以及我认为是rx和ry)保存在一个单独的DataFrame中,这可能是最简单的,其中每一列对应一个id.

Hi,

I have some continuous x/y coordinates from a behavioural experiment, that I would like to average within groups using Pandas.

I'm using a subset of the data here.

data
Out[11]: 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2036 entries, 0 to 1623
Data columns (total 9 columns):
id               2036  non-null values
subject          2036  non-null values
code             2036  non-null values
acc              2036  non-null values
nx               2036  non-null values
ny               2036  non-null values
rx               2036  non-null values
ry               2036  non-null values
reaction_time    2036  non-null values
dtypes: bool(1), int64(3), object(5)

nx and ny hold a series of TimeSeries objects, all of which have the same indices.

data.nx.iloc[0]
Out[16]: 
0     0
1     0
2     0
3     0
4     0
5     0
6     0
7     0
8     0
9     0
10    0
11    0
12    0
13    0
14    0
...
86     1.019901
87     1.010000
88     1.010000
89     1.005921
90     1.000000
91     1.000000
92     1.000000
93     1.000000
94     1.000000
95     1.000000
96     1.000000
97     1.000000
98     1.000000
99     1.000000
100    1.000000
Length: 101, dtype: float64

These TimeSeries columns can be average normally, using data.nx.mean(), and behave as expected, but I hit trouble when I try to group the data.

grouped = data.groupby(['code', 'acc'])
means = grouped.mean()
print means
                       id          subject  reaction_time
code   acc                                               
group1 False  1570.866667  47474992.333333    1506.000000
       True   1337.076152  46022403.623246    1322.116232
group2 False  1338.180180  48730402.045045    1289.112613
       True   1382.631757  42713592.628378    1294.952703
group3 False  1488.587156  43202477.623853    1349.568807
       True   1310.415233  47054310.498771    1341.837838
group4 False  1339.682540  52530349.936508    1540.714286
       True   1343.261176  44606616.407059    1362.174118

Strangely, I can force them to average the TimeSeries data, and may have to fall back on hacking this way, like so:

for name, group in grouped:
     print group.nx.mean()

0     0.000000
1     0.000000
2     0.000000
3     0.000000
4     0.000000
5     0.000667
6     0.000683
7     0.001952
8     0.002000
9     0.002000

{etc, 101 values for 6 groups}

Finally, if I try to force the GroupBy object to average them, I get the following:

grouped.nx.mean()
---------------------------------------------------------------------------
DataError                                 Traceback (most recent call last)
<ipython-input-25-0b536a966e02> in <module>()
----> 1 grouped.nx.mean()

/usr/local/lib/python2.7/dist-packages/pandas-0.12.0-py2.7-linux-i686.egg/pandas/core/groupby.pyc in mean(self)
    357         """
    358         try:
--> 359             return self._cython_agg_general('mean')
    360         except GroupByError:
    361             raise

/usr/local/lib/python2.7/dist-packages/pandas-0.12.0-py2.7-linux-i686.egg/pandas/core/groupby.pyc in _cython_agg_general(self, how, numeric_only)
    462 
    463         if len(output) == 0:
--> 464             raise DataError('No numeric types to aggregate')
    465 
    466         return self._wrap_aggregated_output(output, names)

DataError: No numeric types to aggregate

Has anyone any ideas?

解决方案

A Series where each entry is itself a Series is not idiomatic. I think "No numeric types to aggregate" is telling you that pandas is trying to take the average of a list of Series (not the average of the numeric data they contain) which is not defined.

You should organize your data so nx and ny contain actual numbers. It might be simplest to keep nx, ny, (and, I think, rx and ry) in a separate DataFrame, where each column corresponds to one id.

这篇关于使用.groupby()的 pandas 时间序列的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆