如何使用Python读取MS-Word文件中表的内容? [英] How to read contents of an Table in MS-Word file Using Python?
问题描述
如何读取和处理DOCX文件中表格的每个单元格的内容?
How can I read and process contents of every cell of a table in a DOCX file?
我正在Windows 7和PyWin32上使用Python 3.2来访问MS-Word文档.
I am using Python 3.2 on Windows 7 and PyWin32 to access the MS-Word Document.
我是一个初学者,所以我不知道访问表格单元格的正确方法.到目前为止,我只是这样做:
I am a beginner so I don't know proper way to reach to table cells. So far I have just done this:
import win32com.client as win32
word = win32.gencache.EnsureDispatch('Word.Application')
word.Visible = False
doc = word.Documents.Open("MyDocument")
推荐答案
以下是适用于我的Python 2.7:
Here is what works for me in Python 2.7:
import win32com.client as win32
word = win32.Dispatch("Word.Application")
word.Visible = 0
word.Documents.Open("MyDocument")
doc = word.ActiveDocument
要查看您的文档有多少张表:
To see how many tables your document has:
doc.Tables.Count
然后,您可以通过索引选择所需的表.请注意,与python不同,COM索引从1开始:
Then, you can select the table you want by its index. Note that, unlike python, COM indexing starts at 1:
table = doc.Tables(1)
要选择一个单元格:
table.Cell(Row = 1, Column= 1)
获取其内容:
table.Cell(Row =1, Column =1).Range.Text
希望这会有所帮助.
根据标题返回Column index的函数示例:
An example of a function that returns Column index based on its heading:
def Column_index(header_text):
for i in range(1 , table.Columns.Count+1):
if table.Cell(Row = 1,Column = i).Range.Text == header_text:
return i
然后,您可以通过这种方式访问所需的单元格,例如:
then you can access the cell you want this way for example:
table.Cell(Row =1, Column = Column_index("The Column Header") ).Range.Text
这篇关于如何使用Python读取MS-Word文件中表的内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!