csv& xlsx文件导入到pandas数据框:速度问题 [英] csv & xlsx files import to pandas data frame: speed issue
问题描述
从xlsx文件读取数据(只有20000个数字)会永远:
Reading data (just 20000 numbers) from a xlsx file takes forever:
import pandas as pd
xlsxfile = pd.ExcelFile("myfile.xlsx")
data = xlsxfile.parse('Sheet1', index_col = None, header = None)
大约需要9秒。
如果我以csv格式保存相同的文件,需要〜25ms:
If I save the same file in csv format it takes ~25ms:
import pandas as pd
csvfile = "myfile.csv"
data = pd.read_csv(csvfile, index_col = None, header = None)
这是openpyxl的问题还是我错过了什么?是否有其他选择?
Is this an issue of openpyxl or am I missing something? Are there any alternatives?
推荐答案
xlrd 支持.xlsx文件,此回答表明至少beta版带有.xlsx支持的xlrd版本比openpyxl快。
xlrd has support for .xlsx files, and this answer suggests that at least the beta version of xlrd with .xlsx support was quicker than openpyxl.
当前稳定版本的Pandas(11.0)使用openpyxl作为.xlsx文件,但是在下一个版本中已经更改。如果您想要使用它,您可以从 GitHub 下载开发版本
The current stable version of Pandas (11.0) uses openpyxl for .xlsx files, but this has been changed for the next release. If you want to give it a go, you can download the dev version from GitHub
这篇关于csv& xlsx文件导入到pandas数据框:速度问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!