如何使用 Pandas 从 Excel 中读取某些列 - Python [英] how to read certain columns from Excel using Pandas - Python

查看:91
本文介绍了如何使用 Pandas 从 Excel 中读取某些列 - Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从 Excel 工作表中读取数据,我想读取某些列:第 0 列因为它是行索引,以及第 22:37 列.现在这就是我要做的:

将pandas导入为pd将 numpy 导入为 npfile_loc = "path.xlsx"df = pd.read_excel(file_loc, index_col=None, na_values=['NA'], parse_cols = 37)df= pd.concat([df[df.columns[0]], df[df.columns[22:]]],axis=1)

但我希望有更好的方法来做到这一点!我知道如果我执行 parse_cols=[0, 22,..,37] 我可以做到,但是对于大型数据集这没有意义.

我也这样做了:

s = pd.Series(0)s[1]=22对于范围内的 i (2,14):s[i]=s[i-1]+1df = pd.read_excel(file_loc, index_col=None, na_values=['NA'], parse_cols = s)

但它读取前 15 列,即 s 的长度.

解决方案

您可以像这样使用列索引(字母):

将pandas导入为pd将 numpy 导入为 npfile_loc = "path.xlsx";df = pd.read_excel(file_loc, index_col=None, na_values=['NA'], usecols=A,C:AA")打印(df)

对应文档:><块引用>

usecols : int, str, list-like, or callable default None

  • 如果没有,则解析所有列.

  • 如果是 str,则表示 Excel 列字母和列范围的逗号分隔列表(例如A:E"或A,C,E:F").范围包括双方.

  • 如果是 int 列表,则表示要解析的列号列表.

  • 如果是字符串列表,则表示要解析的列名列表.

    0.24.0 版的新功能.

  • 如果可调用,则根据它评估每个列名,如果可调用返回 True,则解析该列.

根据上述行为返回列的子集.

0.24.0 版的新功能.

I am reading from an Excel sheet and I want to read certain columns: column 0 because it is the row-index, and columns 22:37. Now here is what I do:

import pandas as pd
import numpy as np
file_loc = "path.xlsx"
df = pd.read_excel(file_loc, index_col=None, na_values=['NA'], parse_cols = 37)
df= pd.concat([df[df.columns[0]], df[df.columns[22:]]], axis=1)

But I would hope there is better way to do that! I know if I do parse_cols=[0, 22,..,37] I can do it, but for large datasets this doesn't make sense.

I also did this:

s = pd.Series(0)
s[1]=22
for i in range(2,14):
    s[i]=s[i-1]+1
df = pd.read_excel(file_loc, index_col=None, na_values=['NA'], parse_cols = s)

But it reads the first 15 columns which is the length of s.

解决方案

You can use column indices (letters) like this:

import pandas as pd
import numpy as np
file_loc = "path.xlsx"
df = pd.read_excel(file_loc, index_col=None, na_values=['NA'], usecols="A,C:AA")
print(df)

Corresponding documentation:

usecols : int, str, list-like, or callable default None

  • If None, then parse all columns.

  • If str, then indicates comma separated list of Excel column letters and column ranges (e.g. "A:E" or "A,C,E:F"). Ranges are inclusive of both sides.

  • If list of int, then indicates list of column numbers to be parsed.

  • If list of string, then indicates list of column names to be parsed.

    New in version 0.24.0.

  • If callable, then evaluate each column name against it and parse the column if the callable returns True.

Returns a subset of the columns according to behavior above.

New in version 0.24.0.

这篇关于如何使用 Pandas 从 Excel 中读取某些列 - Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆