如何使用Win32com从Word文档中按颜色获取文本? [英] How can I get the text by color from a word document with win32com?
本文介绍了如何使用Win32com从Word文档中按颜色获取文本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个Word文档,其中包含多个表格.每个表中都有黑色和红色两种颜色.
I have a word document with several tables. In each table there are two colors, black and red.
我想从Word文档表的单元格中按其颜色获取文本.我找到了一种方法,但是我认为它效率很低.
I'd like to get the text from cells in a word document table by its color. I found a way, but I think it's very inefficient.
以下代码从单词表单元格获取文本,并用颜色打印每个单词.
The following code gets the text from a word table cell, and prints each word with it's color.
import os, sys
import win32com.client, re
path = os.path.join(os.getcwd(),"../files/tests2.docx")
word = win32com.client.Dispatch("Word.Application")
word.Visible = 1
doc=word.Documents.Open(path)
for table in doc.Tables:
f = 2
c = 2
wc = table.Cell(f,c).Range.Words.Count
for i in range(1,wc):
print table.Cell(f,c).Range.Words(i), table.Cell(f,c).Range.Words(i).Font.Color
您知道实现此目标的其他(更好)方法吗?
Do you know any other (better) way to achieve this?
谢谢.
推荐答案
这是使用 python-docx :
#!usr/bin/python
# -*- coding: utf-8 -*-
from docx import *
document = opendocx(r'test.docx')
words = document.xpath('//w:r', namespaces=document.nsmap)
WPML_URI = "{http://schemas.openxmlformats.org/wordprocessingml/2006/main}"
tag_rPr = WPML_URI + 'rPr'
tag_highlight = WPML_URI + 'highlight'
tag_val = WPML_URI + 'val'
tag_t = WPML_URI + 't'
for word in words:
for rPr in word.findall(tag_rPr):
high=rPr.findall(tag_highlight)
for hi in high:
if hi.attrib[tag_val] == 'yellow':
print word.find(tag_t).text.encode('utf-8').lower()
这篇关于如何使用Win32com从Word文档中按颜色获取文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文