遍历openpyxl中只读工作簿中的列 [英] Iterate through columns in Read-only workbook in openpyxl

查看：745 发布时间：2020/5/21 1:43:39 python excel openpyxl

本文介绍了遍历openpyxl中只读工作簿中的列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个很大的.xlsx文件-19列5185行.我想打开文件，读取一列中的所有值，对这些值做一些处理，然后在同一工作簿中创建一个新列，并写出修改后的值.因此，我需要能够在同一文件中进行读取和写入.

I have a somewhat large .xlsx file - 19 columns, 5185 rows. I want to open the file, read all the values in one column, do some stuff to those values, and then create a new column in the same workbook and write out the modified values. Thus, I need to be able to both read and write in the same file.

我的原始代码是这样做的:

My original code did this:

def readExcel(doc):
    wb = load_workbook(generalpath + exppath + doc)
    ws = wb["Sheet1"]

    # iterate through the columns to find the correct one
    for col in ws.iter_cols(min_row=1, max_row=1):
        for mycell in col:
            if mycell.value == "PerceivedSound.RESP":
                origCol = mycell.column

    # get the column letter for the first empty column to output the new values
    newCol = utils.get_column_letter(ws.max_column+1)

    # iterate through the rows to get the value from the original column,
    # do something to that value, and output it in the new column
    for myrow in range(2, ws.max_row+1):
        myrow = str(myrow)
        # do some stuff to make the new value
        cleanedResp = doStuff(ws[origCol + myrow].value)
        ws[newCol + myrow] = cleanedResp

    wb.save(doc)

但是，由于工作簿太大，python在第3853行之后引发了内存错误. openpyxl文档说使用只读模式( https://openpyxl.readthedocs.io/zh/latest/optimized.html )来处理大型工作簿.我现在正在尝试使用它；但是，当我添加read_only = True参数时，似乎没有办法遍历各列:

However, python threw a memory error after row 3853 because the workbook was too big. The openpyxl docs said to use Read-only mode (https://openpyxl.readthedocs.io/en/latest/optimized.html) to handle big workbooks. I'm now trying to use that; however, there seems to be no way to iterate through the columns when I add the read_only = True param:

def readExcel(doc):
    wb = load_workbook(generalpath + exppath + doc, read_only=True)
    ws = wb["Sheet1"]

    for col in ws.iter_cols(min_row=1, max_row=1):
        #etc.

python抛出此错误: AttributeError:"ReadOnlyWorksheet"对象没有属性"iter_cols"

python throws this error: AttributeError: 'ReadOnlyWorksheet' object has no attribute 'iter_cols'

如果我将以上代码段的最后一行更改为:

If I change the final line in the above snippet to:

for col in ws.columns:

python抛出相同的错误: AttributeError:"ReadOnlyWorksheet"对象没有属性"columns"

python throws the same error: AttributeError: 'ReadOnlyWorksheet' object has no attribute 'columns'

遍历行很好(并且包含在我上面链接的文档中):

Iterating over rows is fine (and is included in the documentation I linked above):

for col in ws.rows:

(无错误)

这个问题询问有关AttritubeError的问题，但解决方案是删除Read -only模式，这对我不起作用，因为openpyxl不会以非只读模式读取我的整个工作簿.

This question asks about the AttritubeError but the solution is to remove Read-only mode, which doesn't work for me because openpyxl won't read my entire workbook in not Read-only mode.

所以:如何在大型工作簿中的各列之间进行迭代?

So: how do I iterate through columns in a large workbook?

我还没有遇到过这个问题，但是一旦遍历各列，我将进行一次遍历:如果所说的工作簿很大，我该如何读写同一份工作簿?

And I haven't yet encountered this, but I will once I can iterate through the columns: how do I both read and write the same workbook, if said workbook is large?

谢谢！

推荐答案

根据文档，只读模式仅支持基于行的读取(未实现列读取).但这并不难解决:

According to the documentation, ReadOnly mode only supports row-based reads (column reads are not implemented). But that's not hard to solve:

wb2 = Workbook(write_only=True)
ws2 = wb2.create_sheet()

# find what column I need
colcounter = 0
for row in ws.rows:
    for cell in row:
        if cell.value == "PerceivedSound.RESP":
            break
        colcounter += 1

    # cells are apparently linked to the parent workbook meta
    # this will retain only values; you'll need custom
    # row constructor if you want to retain more

    row2 = [cell.value for cell in row]
    ws2.append(row2) # preserve the first row in the new file

break

for row in ws.rows:
    row2 = [cell.value for cell in row]
    row2.append(doStuff(row2[colcounter]))
    ws2.append(row2) # write a new row to the new wb

wb2.save('newfile.xlsx')
wb.close()
wb2.close()

# copy `newfile.xlsx` to `generalpath + exppath + doc`
# Either using os.system,subprocess.popen, or shutil.copy2()

您将无法写入同一工作簿，但是如上所示，您可以打开一个新工作簿(以只写模式)，对其进行写入，然后使用OS Copy覆盖旧文件.

You will not be able to write to the same workbook, but as shown above you can open a new workbook (in writeonly mode), write to it, and overwrite the old file using OS copy.

这篇关于遍历openpyxl中只读工作簿中的列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

遍历openpyxl中只读工作簿中的列 [英] Iterate through columns in Read-only workbook in openpyxl

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

遍历openpyxl中只读工作簿中的列 [英] Iterate through columns in Read-only workbook in openpyxl

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭