ExcelFile对read_excel在大 pandas [英] ExcelFile Vs. read_excel in pandas
问题描述
我在SO上看到答案,它使用或者解决差异。此外,Google搜索没有产生讨论此问题的结果。
WRT我的测试,这些似乎相当:
$ b $
$ b df = xl.parse(dummydata)#表单名称
和
code> path =test / dummydata.xlsx
df = pd.io.excel.read_excel(path,sheetname = 0)
除了后者救了我一条线之外,两者之间有区别,有没有理由使用任何一个?
谢谢!
ExcelFile.parse
更快。
假设您正在循环中读取数据框。
使用 ExcelFile.parse
只需传递 Excelfile
对象( xl
在你的情况)。所以excel工作表只是加载一次,你使用它来获取你的数据框。
如果是Read_Excel,则传递路径而不是 Excelfile
对象。所以基本上每次工作簿再次加载如果您的工作簿有多张纸张和数万行,会造成混乱。
I'm diving into pandas and experimenting around. As for reading data from an Excel file. I wonder what's the difference between using ExcelFile to read_excel. Both seem to work (albeit slightly different syntax, as could be expected), and the documentation supports both. In both cases, the documentation describes the method the same: "Read an Excel table into DataFrame" and "Read an Excel table into a pandas DataFrame". (documentation for read_excel, and for excel_file)
I'm seeing answers here on SO that uses either, w/o addressing the difference. Also, a Google search didn't produce a result that discusses this issue.
WRT my testing, these seem equivalent:
path = "test/dummydata.xlsx"
xl = pd.ExcelFile(path)
df = xl.parse("dummydata") # sheet name
and
path = "test/dummydata.xlsx"
df = pd.io.excel.read_excel(path, sheetname=0)
other than the fact that the latter saves me a line, is there a difference between the two, and is there a reason to use either one?
Thanks!
ExcelFile.parse
is faster.
Suppose you are reading dataframes in a loop.
With ExcelFile.parse
you just pass the Excelfile
object(xl
in your case). So the excel sheet is just loaded once and you use this to get your dataframes.
In case of Read_Excel you pass the path instead of Excelfile
object. So essentially every time the workbook is loaded again. Makes a mess if your workbook has loads of sheets and tens of thousands of rows.
这篇关于ExcelFile对read_excel在大 pandas 的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!