将Pandas函数实现为numpy函数 [英] Implementing pandas function to numpy functions

查看:101
本文介绍了将Pandas函数实现为numpy函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法像y_mean函数那样使用熊猫库将xy_mean函数转换为要计算的函数.我发现pandas函数Y_mean = pd.Series(PC_list).rolling(number).mean().dropna().to_numpy()比numpy版本ym = (np.convolve(PC_list, np.ones(shape=(number)), mode='valid')/number)[:-1]快得多. xy_mean的等式为((index of value)*value + (index of value)*value)/number.索引号将取决于变量number的值.因此,以下示例的第一组计算为(457.334015*1 + 424.440002*2 +394.795990*3)/number,下一组数字为 (424.440002*2 +394.795990*3 + 408.903992*4)/number,依此类推.如果number = 4大于第一组计算,则将是(457.334015*1 + 424.440002*2 +394.795990*3 +408.903992*4)/number.设定的均值计算将一直进行到PC_list数组的末尾.

Is there a way I could convert the xy_mean function to be computed using the pandas library just like the y_mean function. I found out that the pandas function Y_mean = pd.Series(PC_list).rolling(number).mean().dropna().to_numpy() is way faster than the numpy version ym = (np.convolve(PC_list, np.ones(shape=(number)), mode='valid')/number)[:-1]. The equation for the xy_mean would be ((index of value)*value + (index of value)*value)/number The index number would be dependent on the variable numbers value. So the first set of calculations for the example below would be (457.334015*1 + 424.440002*2 +394.795990*3)/number and the next set of numbers would be (424.440002*2 +394.795990*3 + 408.903992*4)/number and so on. If number = 4 Than the first set of calculations would be (457.334015*1 + 424.440002*2 +394.795990*3 +408.903992*4)/number. The set mean calculations would go on until the end of the PC_list array.

变量:

number = 3
PC_list= np.array([457.334015,424.440002,394.795990,408.903992,398.821014,402.152008,435.790985,423.204987,411.574005,
404.424988,399.519989,377.181000,375.467010,386.944000,383.614990,375.071991,359.511993,328.865997,
320.510010,330.079010,336.187012,352.940002,365.026001,361.562012,362.299011,378.549011,390.414001,
400.869995,394.773010,382.556000])

香草python版本:

Vanilla python version:

y_mean = sum(PC_list[i:i+number])/number
xy_mean = sum([x * (i + 1) for i, x in enumerate(PC_list[i:i+number])])/number

数字版本:

y_mean = (np.convolve(PC_list, np.ones(shape=(number)), mode='valid')/number)[:-1]
xy_mean = (np.convolve(PC_list, np.arange(number, 0, -1), mode='valid'))[:-1]

熊猫版

Y_mean = pd.Series(PC_list).rolling(number).mean().dropna().to_numpy()
xy_mean = ? 

推荐答案

您需要为此定义一个自定义函数,并将其传递给

You would need to define a custom function for that, and pass it to rolling.apply:

>>> multiplier = np.arange(0, number)

>>> def xymean(series):
        return series.mul(multiplier).sum()

>>> pd.Series(PC_list).rolling(number).apply(xymean).dropna().to_numpy()[:-1]

array([2490.601989, 2440.743958, 2409.067016, 2413.002044, 2510.497985,
       2543.348939, 2516.922974, 2459.627961, 2418.983948, 2335.007966,
       2280.283019, 2288.94702 , 2300.19998 , 2279.389953, 2212.294951,
       2080.693968, 1978.774017, 1960.123047, 1989.229066, 2061.27304 ,
       2137.145019, 2167.67804 , 2175.047058, 2221.807067, 2290.639036,
       2361.986998, 2376.473021])

>>> (np.convolve(PC_list, np.arange(number, 0, -1), mode='valid'))[:-1]
 
array([2490.601989, 2440.743958, 2409.067016, 2413.002044, 2510.497985,
       2543.348939, 2516.922974, 2459.627961, 2418.983948, 2335.007966,
       2280.283019, 2288.94702 , 2300.19998 , 2279.389953, 2212.294951,
       2080.693968, 1978.774017, 1960.123047, 1989.229066, 2061.27304 ,
       2137.145019, 2167.67804 , 2175.047058, 2221.807067, 2290.639036,
       2361.986998, 2376.473021])

但是,由于apply,这会稍微慢一些.此外,似乎您的numpy版本创建了xy_sum而不是xy_mean,要使其计算mean,您将需要:

However, this will be a little slower, owing to the apply. Furthermore, it seems like your numpy version creates xy_sum as opposed to xy_mean, to make it calculate mean you would need:

>>> (np.convolve(PC_list, np.arange(number, 0, -1), mode='valid')/number)[:-1]

array([830.200663  , 813.58131933, 803.02233867, 804.33401467,
       836.83266167, 847.78297967, 838.97432467, 819.875987  ,
       806.32798267, 778.33598867, 760.09433967, 762.98234   ,
       766.73332667, 759.796651  , 737.43165033, 693.564656  ,
       659.591339  , 653.374349  , 663.07635533, 687.09101333,
       712.381673  , 722.55934667, 725.015686  , 740.60235567,
       763.54634533, 787.32899933, 792.15767367])

>>> def xymean(series):
        return series.mul(multiplier).mean()

>>> pd.Series(PC_list).rolling(number).apply(xymean).dropna().to_numpy()[:-1]

array([830.200663  , 813.58131933, 803.02233867, 804.33401467,
       836.83266167, 847.78297967, 838.97432467, 819.875987  ,
       806.32798267, 778.33598867, 760.09433967, 762.98234   ,
       766.73332667, 759.796651  , 737.43165033, 693.564656  ,
       659.591339  , 653.374349  , 663.07635533, 687.09101333,
       712.381673  , 722.55934667, 725.015686  , 740.60235567,
       763.54634533, 787.32899933, 792.15767367])

这篇关于将Pandas函数实现为numpy函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆