ExcelFile对read_excel在大 pandas [英] ExcelFile Vs. read_excel in pandas

查看:589
本文介绍了ExcelFile对read_excel在大 pandas 的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在潜入大熊猫并进行实验。至于从Excel文件读取数据。我不知道使用ExcelFile到read_excel有什么区别。两者似乎都可以工作(尽管语法略有不同,可以预期),文档支持两者。在这两种情况下,文档描述了相同的方法:将Excel表读入DataFrame和将Excel表读入大熊猫DataFrame。 ( read_excel的文档和<对于excel_file,href =http://pandas.pydata.org/pandas-docs/stable/generated/pandas.ExcelFile.parse.html?highlight=excelfile#pandas.ExcelFile =nofollow> )



我在SO上看到答案,它使用或者解决差异。此外,Google搜索没有产生讨论此问题的结果。



WRT我的测试,这些似乎相当:


$ b $
$ b df = xl.parse(dummydata)#表单名称

 code> path =test / dummydata.xlsx
df = pd.io.excel.read_excel(path,sheetname = 0)

除了后者救了我一条线之外,两者之间有区别,有没有理由使用任何一个?



谢谢!

解决方案

ExcelFile.parse 更快。



假设您正在循环中读取数据框。
使用 ExcelFile.parse 只需传递 Excelfile 对象( xl 在你的情况)。所以excel工作表只是加载一次,你使用它来获取你的数据框。
如果是Read_Excel,则传递路径而不是 Excelfile 对象。所以基本上每次工作簿再次加载如果您的工作簿有多张纸张和数万行,会造成混乱。


I'm diving into pandas and experimenting around. As for reading data from an Excel file. I wonder what's the difference between using ExcelFile to read_excel. Both seem to work (albeit slightly different syntax, as could be expected), and the documentation supports both. In both cases, the documentation describes the method the same: "Read an Excel table into DataFrame" and "Read an Excel table into a pandas DataFrame". (documentation for read_excel, and for excel_file)

I'm seeing answers here on SO that uses either, w/o addressing the difference. Also, a Google search didn't produce a result that discusses this issue.

WRT my testing, these seem equivalent:

path = "test/dummydata.xlsx"
xl = pd.ExcelFile(path)
df = xl.parse("dummydata")  # sheet name

and

path = "test/dummydata.xlsx" 
df = pd.io.excel.read_excel(path, sheetname=0)

other than the fact that the latter saves me a line, is there a difference between the two, and is there a reason to use either one?

Thanks!

解决方案

ExcelFile.parse is faster.

Suppose you are reading dataframes in a loop. With ExcelFile.parse you just pass the Excelfile object(xl in your case). So the excel sheet is just loaded once and you use this to get your dataframes. In case of Read_Excel you pass the path instead of Excelfile object. So essentially every time the workbook is loaded again. Makes a mess if your workbook has loads of sheets and tens of thousands of rows.

这篇关于ExcelFile对read_excel在大 pandas 的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆