从.docx文件解析表 [英] Parsing of table from .docx file

查看：130 发布时间：2020/5/25 1:01:18 python xml parsing docx python-docx

本文介绍了从.docx文件解析表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想使用Python和 python-docx 从.docx文件中解析一个表

转换为一些有用的数据结构.

I want to parse a table from a .docx file using Python and python-docx into some useful data structure.

在我的情况下，.docx文件仅包含一个表.我已经上传了它，因此您可以看看.这是屏幕截图:

The .docx file contains only a single table in my case. I've uploaded it so you can have a look. Here's a screenshot:

推荐答案

您可以使用下面的代码片段将文档解析为列表，其中每一行都是将表头值映射到列值的字典.

You can use the snippet below to parse your document into a list where each row is a dictionary mapping the table header value to the column value.

from docx.api import Document

# Load the first table from your document. In your example file,
# there is only one table, so I just grab the first one.
document = Document('Books.docx')
table = document.tables[0]

# Data will be a list of rows represented as dictionaries
# containing each row's data.
data = []

keys = None
for i, row in enumerate(table.rows):
    text = (cell.text for cell in row.cells)

    # Establish the mapping based on the first row
    # headers; these will become the keys of our dictionary
    if i == 0:
        keys = tuple(text)
        continue

    # Construct a dictionary for this row, mapping
    # keys to values for this row
    row_data = dict(zip(keys, text))
    data.append(row_data)

这将为您提供:

data = [
  {u'Pub.': u'Penguin Books',
   u'Auther': u'Edward de BONO',
   u'Sr. No.': u'1',
   u'Name of Book': u'Six Thinking Hats'
  },
  ...
]

如果只想为每行提供一个元组，则应该创建row_data到text的元组值，而不是创建字典，因此在循环中，而不是构造dict，请执行以下操作:

If you'd just want a tuple for each row, you should instead of creating a dictionary just set row_data to the tuple value of text, so in the loop instead of constructing the dict, do:

# Construct a tuple for this row
row_data = tuple(text)
data.append(row_data)

现在，data会改为保留以下内容:

Now, data would hold something like this instead:

data = [
  (u'1',
   u'Six Thinking Hats',
   u'Edward de BONO',
   u'Penguin Books'
  ),
 ...
]

然后，您显然可以跳过构造keys的操作(但仍跳过第一行！).

Then you can skip constructing keys, obviously (but still skip the first row!).

这篇关于从.docx文件解析表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从.docx文件解析表 [英] Parsing of table from .docx file

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从.docx文件解析表 [英] Parsing of table from .docx file

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭