Python Pandas数据帧在excel表中读取精确的指定范围 [英] Python Pandas dataframe reading exact specified range in an excel sheet

查看:890
本文介绍了Python Pandas数据帧在excel表中读取精确的指定范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有很多不同的表格(和excel表格中的其他非结构化数据)..我需要创建一个数据帧超出范围'A3:D20'从'Sheet2'的Excel表'数据'



所有的例子,我深入到钻取层级,但不是如何从一个确切的范围选择

  import openpyxl 
import pandas as pd

wb = openpyxl.load_workbook('data.xlsx')
sheet = wb.get_sheet_by_name('Sheet2' )
range = ['A3':'D20']#< - 如何指定?
spots = pd.DataFrame(sheet.range)#what应该是这个的确切语法?

打印(点)

一旦我得到这个,那么我打算在列A中查找一些数据,并在列B中找到相应的值。

编辑:我意识到openpyxl需要太长时间,所以更改为 pandas.read_excel('data.xlsx','Sheet2')而不是,在这个阶段,nad的速度要快得多。



Edit2:暂时把我的数据放在一张表中,删除了我最左边一列的所有其他info.added列名,应用 index_col ,然后使用wb

解决方案

这样做的一个方法是使用 openpyxl 模块。



这里有一个例子:

  from openpyxl import load_workbook 

wb = load_workbook(filename ='data.xlsx',
read_only = True )

ws = wb ['Sh eet2']

#将单元格值读入列表列表
data_rows = []
for ws ['A3':'D20']:
data_cols = []
行中的单元格:
data_cols.append(cell.value)
data_rows.append(data_cols)

#转换为数据框
import pandas as pd
df = pd.DataFrame(data_rows)


I have a lot of different table (and other unstructured data in an excel sheet) .. I need to create a dataframe out of range 'A3:D20' from 'Sheet2' of Excel sheet 'data'

all examples that I come across drilldown up to sheet level, but not how to pick it from an exact range

import openpyxl
import pandas as pd

wb = openpyxl.load_workbook('data.xlsx')
sheet = wb.get_sheet_by_name('Sheet2')
range = ['A3':'D20']   #<-- how to specify this?
spots = pd.DataFrame(sheet.range) #what should be the exact syntax for this?

print (spots)

Once I get this, then I plan to lookup for some data in column A and find its corresponding value in column B

EDIT: I realised that openpyxl takes too long, and so have changed that to pandas.read_excel('data.xlsx','Sheet2') instead, nad is much faster at that stage atleast

Edit2: For the time being, I have put my data in just one sheet and removed all other info..added column names, Applied index_col on my leftmost column.. and then using wb.loc[] which solves it for me

解决方案

One way to do this is to use the openpyxl module.

Here's an example:

from openpyxl import load_workbook

wb = load_workbook(filename='data.xlsx', 
                   read_only=True)

ws = wb['Sheet2']

# Read the cell values into a list of lists
data_rows = []
for row in ws['A3':'D20']:
    data_cols = []
    for cell in row:
        data_cols.append(cell.value)
    data_rows.append(data_cols)

# Transform into dataframe
import pandas as pd
df = pd.DataFrame(data_rows)

这篇关于Python Pandas数据帧在excel表中读取精确的指定范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆