如何在Pandas中使用read_excel提高处理速度? [英] How to increase process speed using read_excel in pandas?

查看：1736 发布时间：2020/5/24 0:19:29 python excel pandas performance dataframe

本文介绍了如何在Pandas中使用read_excel提高处理速度?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要使用 pd.read_excel 来处理一个excel文件中的每张工作表.
但是在大多数情况下，我不知道工作表名称.
所以我用它来判断excel中有几张纸:

I need use pd.read_excel to process every sheet in one excel file.
But in most cases,I did not know the sheet name.
So I use this to judge how many sheet in excel:

i_sheet_count=0
i=0
try:
  df.read_excel('/tmp/1.xlsx',sheetname=i)
  i_sheet_count+=1
  i+=1
else:
  i+=1
print(i_sheet_count)

在此过程中，我发现过程非常缓慢，
因此， read_excel 只能读取有限的行以提高速度吗?
我尝试了 nrows ，但是没有用..仍然很慢..

During the process,I found that the process is quite slow,
So,can read_excel only read limited rows to improve the speed?
I tried nrows but did not work..still slow..

无需猜测即可阅读所有工作表

对pd.read_excel使用sheetname = None自变量.这会将 all 工作表读入数据帧字典.例如:

Read all worksheets without guessing

Use sheetname = None argument to pd.read_excel. This will read all worksheets into a dictionary of dataframes. For example:

dfs = pd.read_excel('file.xlsx', sheetname=None)

# access 'Sheet1' worksheet
res = dfs['Sheet1']

限制行数或列数

您可以使用parse_cols和skip_footer参数来限制列和/或行的数量.这样可以减少读取时间，并且还可以与sheetname = None一起使用.

Limit number of rows or columns

You can use parse_cols and skip_footer arguments to limit the number of columns and/or rows. This will reduce read time, and also works with sheetname = None.

例如，以下内容将读取前3列，如果工作表中有100行，则仅读取前20列.

For example, the following will read the first 3 columns and, if your worksheet has 100 rows, it will read only the first 20.

df = pd.read_excel('file.xlsx', sheetname=None, parse_cols='A:C', skip_footer=80)

如果您希望应用特定于工作表的逻辑，可以通过提取工作表名称来实现:

If you wish to apply worksheet-specific logic, you can do so by extracting sheetnames:

sheet_names = pd.ExcelFile('file.xlsx', on_demand=True).sheet_names

dfs = {}
for sheet in sheet_names:
    dfs[sheet] = pd.read_excel('file.xlsx', sheet)

提高性能

将Excel文件读入Pandas自然比其他选项(CSV，Pickle，HDF5)要慢.如果您想提高性能，强烈建议您考虑使用其他格式.

Improving performance

Reading Excel files into Pandas is naturally slower than other options (CSV, Pickle, HDF5). If you wish to improve performance, I strongly suggest you consider these other formats.

例如，一种选择是使用VBA脚本将Excel工作表转换为CSV文件；然后使用pd.read_csv.

One option, for example, is to use a VBA script to convert your Excel worksheets to CSV files; then use pd.read_csv.

这篇关于如何在Pandas中使用read_excel提高处理速度?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在Pandas中使用read_excel提高处理速度? [英] How to increase process speed using read_excel in pandas?

问题描述

推荐答案

无需猜测即可阅读所有工作表

Read all worksheets without guessing

限制行数或列数

Limit number of rows or columns

提高性能

Improving performance

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在Pandas中使用read_excel提高处理速度? [英] How to increase process speed using read_excel in pandas?

问题描述

推荐答案

无需猜测即可阅读所有工作表

Read all worksheets without guessing

限制行数或列数

Limit number of rows or columns

提高性能

Improving performance

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭