XLRD / Python:使用for-loop将Excel文件读入dict [英] XLRD/Python: Reading Excel file into dict with for-loops

查看:129
本文介绍了XLRD / Python:使用for-loop将Excel文件读入dict的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读一个包含15个字段和大约2000行的Excel工作簿,并将每行转换为Python中的字典。然后我想把每个字典附加到列表中。我希望工作簿顶行中的每个字段都是每个字典中的一个键,并且相应的单元格值是字典中的值。我已经看过这里这里,但我想做点什么有点不一样第二个例子将会起作用,但是我觉得在顶行更有效地循环填充字典键,然后遍历每一行来获取值。我的Excel文件包含来自论坛的数据,看起来像这样(显然有更多的列):

I'm looking to read in an Excel workbook with 15 fields and about 2000 rows, and convert each row to a dictionary in Python. I then want to append each dictionary to a list. I'd like each field in the top row of the workbook to be a key within each dictionary, and have the corresponding cell value be the value within the dictionary. I've already looked at examples here and here, but I'd like to do something a bit different. The second example will work, but I feel like it would be more efficient looping over the top row to populate the dictionary keys and then iterate through each row to get the values. My Excel file contains data from discussion forums and looks something like this (obviously with more columns):

id    thread_id    forum_id    post_time    votes    post_text
4     100          3           1377000566   1        'here is some text'
5     100          4           1289003444   0        'even more text here'

所以,我想要字段 id thread_id 等等,成为字典键。我想我的字典看起来像:

So, I'd like the fields id, thread_id and so on, to be the dictionary keys. I'd like my dictionaries to look like:

{id: 4, 
thread_id: 100,
forum_id: 3,
post_time: 1377000566,
votes: 1,
post_text: 'here is some text'}

最初,我有一些代码像这样遍历文件,但我的范围是错误的一些for循环,我正在生成方式太多的字典。这是我的初始代码:

Initially, I had some code like this iterating through the file, but my scope is wrong for some of the for-loops and I'm generating way too many dictionaries. Here's my initial code:

import xlrd
from xlrd import open_workbook, cellname

book = open('forum.xlsx', 'r')
sheet = book.sheet_by_index(3)

dict_list = []

for row_index in range(sheet.nrows):
    for col_index in range(sheet.ncols):
        d = {}

        # My intuition for the below for-loop is to take each cell in the top row of the 
        # Excel sheet and add it as a key to the dictionary, and then pass the value of 
        # current index in the above loops as the value to the dictionary. This isn't
        # working.

        for i in sheet.row(0):
           d[str(i)] = sheet.cell(row_index, col_index).value
           dlist.append(d)

任何帮助将不胜感激。感谢提前阅读。

Any help would be greatly appreciated. Thanks in advance for reading.

推荐答案

想法是首先将标题读入列表。然后,迭代表格行(从头后的下一个开始),基于标题键和适当的单元格值创建新的字典,并将其附加到字典列表中:

The idea is to, first, read the header into the list. Then, iterate over the sheet rows (starting from the next after the header), create new dictionary based on header keys and appropriate cell values and append it to the list of dictionaries:

from xlrd import open_workbook

book = open_workbook('forum.xlsx')
sheet = book.sheet_by_index(3)

# read header values into the list    
keys = [sheet.cell(0, col_index).value for col_index in xrange(sheet.ncols)]

dict_list = []
for row_index in xrange(1, sheet.nrows):
    d = {keys[col_index]: sheet.cell(row_index, col_index).value 
         for col_index in xrange(sheet.ncols)}
    dict_list.append(d)

print dict_list

一张包含:

A   B   C   D
1   2   3   4
5   6   7   8

它打印:

[{'A': 1.0, 'C': 3.0, 'B': 2.0, 'D': 4.0}, 
 {'A': 5.0, 'C': 7.0, 'B': 6.0, 'D': 8.0}]

UPD(扩展字典理解):

UPD (expanding the dictionary comprehension):

d = {}
for col_index in xrange(sheet.ncols):
    d[keys[col_index]] = sheet.cell(row_index, col_index).value 

这篇关于XLRD / Python:使用for-loop将Excel文件读入dict的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆