Numpy Genfromtxt 比 Pandas read_csv 慢 [英] Numpy Genfromtxt slower than pandas read_csv

查看：71 发布时间：2021/6/10 19:30:52 python csv numpy pandas

本文介绍了Numpy Genfromtxt 比 Pandas read_csv 慢的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在加载一个 CSV 文件(如果你想要特定的文件，它是来自 http://www.kaggle.com/c/loan-default-prediction).在 numpy 中加载 csv 比在 Pandas 中显着花费更多的时间.

I'm loading a CSV file (if you want the specific file, it's the training csv from http://www.kaggle.com/c/loan-default-prediction). Loading the csv in numpy takes dramatically more time than in pandas.

timeit("genfromtxt('train_v2.csv', delimiter=',')", "from numpy import genfromtxt",  number=1)
102.46608114242554

timeit("pandas.io.parsers.read_csv('train_v2.csv')", "import pandas",  number=1)
13.833590984344482

我还要提到 numpy 内存使用量波动更大，更高，并且在加载后显着更高的内存使用量.(numpy 为 2.49 GB，pandas 为 ~600MB)pandas 中的所有数据类型都是 8 个字节，因此不同的 dtypes 没有区别.我的内存使用量远远没有达到最大值，因此时间差不能归因于分页.

I'll also mention that the numpy memory usage fluctuates much more wildly, goes higher, and has significantly higher memory usage once loaded. (2.49 GB for numpy vs ~600MB for pandas) All datatypes in pandas are 8 bytes, so differing dtypes is not the difference. I got nowhere near maxing out my memory usage, so the time difference can not be ascribed to paging.

这种差异有什么原因吗?genfromtxt 效率低下吗?(并泄漏一堆内存?)

Any reason for this difference? Is genfromtxt just way less efficient? (And leaks a bunch of memory?)

numpy 1.8.0 版

numpy version 1.8.0

熊猫版本 0.13.0-111-ge29c8e8

pandas version 0.13.0-111-ge29c8e8

Numpy Genfromtxt 比 Pandas read_csv 慢 [英] Numpy Genfromtxt slower than pandas read_csv

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Numpy Genfromtxt 比 Pandas read_csv 慢 [英] Numpy Genfromtxt slower than pandas read_csv

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭