如何将过滤的Excel表导入python? [英] How to import filtered excel table into python?

查看:92
本文介绍了如何将过滤的Excel表导入python?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个excel表,看起来像

I have an excel table, it looks like

然后,我在性别"列上创建了一个过滤器,以获取所有女性数据,如下所示:

Then, I created a filter on the column Sex to get all the female data, it looks like:

然后,我想将过滤后的数据导入python.我使用了以下pandas命令:

Then, I want to import the filtered data into python. I used the following pandas command:

df = pd.read_excel(io="test.xlsx", sheetname="Sheet1")
print(df)

问题是我得到了所有数据:

The problem is that I got the entire data:

   Id   Name  Age Sex
0   1    Tom   12   M
1   2  Silke   33   F
2   3   Olga   22   F
3   4    Tom   33   M

但是,这不是我想要的,我只想要过滤后的数据:

However, this is not what I want, I only want the filtered data:

   Id   Name  Age Sex
1   2  Silke   33   F
2   3   Olga   22   F

如何使用python做到这一点?

How can I do this with python?

通知:我之所以要导入过滤的数据而不是python中的过滤数据的原因是因为原始的excel数据. excel表中的过滤器非常复杂.它不仅基于许多列,而且还基于数千个不同的值.如果我在python中过滤数据,这将是一个大问题.

Notice: The reason why I want to import the filtered data but not filter data in python, is because of the original excel data. The filters in the excel table are very complicated. It based not only on many columns but also thousands of different values. It will be a big problem if I filter the data in python.

推荐答案

如之前在注释中提到的,一种快速而肮脏的方法来完成此技巧是将过滤后的数据复制到新的Excel文件中(不使用任何过滤器)并获取此类文件中的数据.

As mentioned earlier in comments, a very quick and dirty way to do the trick is copying your filtered data in a new Excel file (without any filter) and get data from such a file.

更优雅,更完整的解决方案是使用 openpyxl 来读取过滤后的数据.

A more elegant and complete solution consists in using openpyxl to read filtered data.

首先,安装openpyxl

$ pip install openpyxl

然后,使用像这样的脚本来读取可见数据:

Then, use a script like this one to read just visible data:

from openpyxl import load_workbook

wb = load_workbook('foo.xlsx') # use the actual path of your workbook
ws = wb['Bar'] # use your sheet name instead of Bar

# iterate over all the rows in the sheet
for row in ws: 
    # use the row only if it has not been filtered out (i.e., it's not hidden)
    if ws.row_dimensions[row[0].row].hidden == False:
        print row # ...or do what you need

请注意,rowCell s的元组.使用属性value访问每个单元格的值:

Note that row is a tuple of Cells. Use the attribute value for accessing to each cell's value:

    if ws.row_dimensions[row[0].row].hidden == False:
        for cell in row:
            print cell.value 

您可以使用该逻辑从原始过滤文件中直接获取值.

You can use that logic to get your values straight from the original filtered file.

PS:openpyxl还提供了一个很酷的 Pandas集成.

PS: openpyxl provides also a cool Pandas integration out of the box.

这篇关于如何将过滤的Excel表导入python?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆