使用dataframe.mean()时运行时间的怪异增长(Pandas性能非数字列) [英] Weird exponential increase in running time when using dataframe.mean() (Pandas performance non-numeric column)

查看：173 发布时间：2020/5/24 3:26:40 python pandas

本文介绍了使用dataframe.mean()时运行时间的怪异增长(Pandas性能非数字列)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用天气数据集(要重现；可以找到数据

I am playing around with a dataset of weather data (To reproduce; data can be found here unzip it and run the code below), and I wanted to normalize the data. To do this, I tried the second answer of this question;

规范化熊猫数据框的列

归结为normalized_df=(df-df.mean(axis=0))/df.std(axis=0)

但是，执行此代码需要很长的时间.因此，我开始调查，看来df.mean()调用花费的时间呈指数增长.

However, it takes a very long time to execute this code. Therefore, I started investigating, and it seems that the time that the df.mean() call takes is increasing exponentially.

我已使用以下代码测试运行时:

I've used the following code to test run-times:

import pandas as pd
import time

jena_climate_df = pd.read_csv("jena_climate_2009_2016.csv")
start = time.time()
print(jena_climate_df[:200000].mean(axis=0)) #Modify the number of rows here to observe the increase in time
stop = time.time()
print(f"{stop-start} Seconds for mean calc")

我进行了一些测试，选择逐渐增加用于平均值计算的行数.查看以下结果:

I ran some tests, selecting increasing the number of rows I use for the mean calculation gradually. See the results below:

0.004987955093383789 Seconds for mean calc ~ 10 observations
0.009006738662719727 Seconds for mean calc ~ 1000 observations
0.0837397575378418 Seconds for mean calc ~ 10000 observations
1.789750337600708 Seconds for mean calc ~ 50000 observations
7.518809795379639 Seconds for mean calc ~ 60000 observations
19.989460706710815 Seconds for mean calc ~ 70000 observations
71.97900629043579 Seconds for mean calc ~ 100000 observations
375.04513001441956 Seconds for mean calc ~ 200000 observations

在我看来，时间正成倍增加.我不知道为什么会这样，AFAIK将所有值相加并将其除以观察次数将不会占用过多的计算量，但也许我在这里是错的.一些解释将不胜感激！

It seems to me that the time is increasing exponentially. I don't know why this is happening, AFAIK adding all values and dividing them by the number of observations shouldn't be too computationally intensive but maybe I am wrong here. Some explanation would be greatly appreciated!

使用dataframe.mean()时运行时间的怪异增长(Pandas性能非数字列) [英] Weird exponential increase in running time when using dataframe.mean() (Pandas performance non-numeric column)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用dataframe.mean()时运行时间的怪异增长(Pandas性能非数字列) [英] Weird exponential increase in running time when using dataframe.mean() (Pandas performance non-numeric column)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭