如何从XLS文件获取表名,而不加载整个文件? [英] How to obtain sheet names from XLS files without loading the whole file?

查看:206
本文介绍了如何从XLS文件获取表名,而不加载整个文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用大熊猫阅读一个Excel文件,并向用户显示工作表名称,所以他可以选择他想要使用的工作表。问题是文件真的很大(70列x 65k行),最多需要14秒才能加载到笔记本上(CSV文件中的相同数据为3秒)。



我的熊猫代码如下:

  xls = pandas.ExcelFile(path)
sheets = xls.sheet_names

我以前尝试过xlrd,但获得了类似的结果。这是我的xlrd代码:

  xls = xlrd.open_workbook(path)
sheets = xls.sheet_names $ b那么,有没有人能够比从阅读整个文件中更快地从Excel文件中检索工作表名称?

解决方案

您可以使用 xlrd 库,并使用on_demand = True标志打开工作簿,以便表单不会自动加载。



您可以检索工作表名称以与大熊猫相似的方式:

  import xlrd 
xls = xlrd.open_workbook(r'< path_to_your_excel_file> ',on_demand = True)
print xls.sheet_names()#< - remeber:xlrd sheet_names是一个函数,不是属性


I'm currently using pandas to read an Excel file and present its sheet names to the user, so he can select which sheet he would like to use. The problem is that the files are really big (70 columns x 65k rows), taking up to 14s to load on a notebook (the same data in a CSV file is taking 3s).

My code in panda goes like this:

xls = pandas.ExcelFile(path)
sheets = xls.sheet_names

I tried xlrd before, but obtained similar results. This was my code with xlrd:

xls = xlrd.open_workbook(path)
sheets = xls.sheet_names

So, can anybody suggest a faster way to retrieve the sheet names from an Excel file than reading the whole file?

解决方案

you can use the xlrd library and open the workbook with the "on_demand=True" flag, so that the sheets won't be loaded automaticaly.

Than you can retrieve the sheet names in a similar way to pandas:

import xlrd
xls = xlrd.open_workbook(r'<path_to_your_excel_file>', on_demand=True)
print xls.sheet_names() # <- remeber: xlrd sheet_names is a function, not a property

这篇关于如何从XLS文件获取表名,而不加载整个文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆