如何实现我自己的describe()函数以在resample()中使用 [英] How to implement my own describe() function to use in resample()

查看:100
本文介绍了如何实现我自己的describe()函数以在resample()中使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理表示矢量(时间轴和方向)的时间序列数据.我想重新采样我的数据,并使用describe函数作为how参数.

I'm working with timeseries data that represents vectors (magnitud and direction). I want to resample my data and use the describe function as the how parameter.

但是,describe方法使用标准平均值,而我想使用特殊功能来求平均方向.因此,我基于pandas.Series.describe()的实现实现了自己的describe方法:

However, the describe method uses a standard average and I want to use a special function to average direction. Because of this, I implemented my own describe method based on the implementation of pandas.Series.describe():

def directionAverage(x):
    result = np.arctan2(np.mean(np.sin(x)), np.mean(np.cos(x)))
    if result < 0:
        result += 2*np.pi
    return result

def directionDescribe(x):
    data = [directionAverage(x), x.std(), x.min(), x.quantile(0.25), x.median(), x.quantile(0.75), x.max()]
    names = ['mean', 'std', 'min', '25%', '50%', '75%', 'max']
    return Series(data, index=names)

问题是当我这样做时:

The problem is that when I do:

df['direction'].resample('10Min', how=directionDescribe)

我收到此异常(显示最后几行):

I get this exception (last few lines are shown):

  File "C:\Python26\lib\site-packages\pandas\core\generic.py", line 234, in resample
    return sampler.resample(self)
  File "C:\Python26\lib\site-packages\pandas\tseries\resample.py", line 83, in resample
    rs = self._resample_timestamps(obj)
  File "C:\Python26\lib\site-packages\pandas\tseries\resample.py", line 217, in _resample_timestamps
    result = grouped.aggregate(self._agg_method)
  File "C:\Python26\lib\site-packages\pandas\core\groupby.py", line 1626, in aggregate
    result = self._aggregate_generic(arg, *args, **kwargs)
  File "C:\Python26\lib\site-packages\pandas\core\groupby.py", line 1681, in _aggregate_generic
    return self._aggregate_item_by_item(func, *args, **kwargs)
  File "C:\Python26\lib\site-packages\pandas\core\groupby.py", line 1706, in _aggregate_item_by_item
    result[item] = colg.aggregate(func, *args, **kwargs)
  File "C:\Python26\lib\site-packages\pandas\core\groupby.py", line 1357, in aggregate
    result = self._aggregate_named(func_or_funcs, *args, **kwargs)
  File "C:\Python26\lib\site-packages\pandas\core\groupby.py", line 1441, in _aggregate_named
    raise Exception('Must produce aggregated value')

问题是:如何实现我自己的describe函数,使其与resample一起使用?

The question is: how do I implement my own describe function so that it works with resample?

推荐答案

您可以groupby代替重采样,其中组是时间单位.您可以向该组应用您选择的功能,例如directionAverage功能.

Instead of resampling, you can groupby where the group is a unit of time. To this group you can apply a function of your choice, for example your directionAverage function.

请注意,我正在导入TimeGrouper函数,以允许按时间间隔进行分组.

Note that I am importing the TimeGrouper function to allow grouping by time intervals.

import pandas as pd
import numpy as np
from pandas.tseries.resample import TimeGrouper

#group  your data
new_data = df['direction'].groupby(TimeGrouper('10min'))
#apply your function to the grouped data
new_data.apply(directionDescribe)

这篇关于如何实现我自己的describe()函数以在resample()中使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆