读取docx文件，识别并存储斜体文本 [英] Reading docx files, recognizing and storing italicized text

查看：63 发布时间：2021/5/2 20:07:32 python string docx

本文介绍了读取docx文件，识别并存储斜体文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我应该如何使用Python读取.docx文件并能够识别斜体文本并将其存储为字符串?

How should I go about reading a .docx file with Python and being able to recognize the italicized text and storing it as a string?

我看了看docx python软件包，但所看到的只是写入.docx文件的功能.

I looked at the docx python package but all I see is features for writing to a .docx file.

我非常感谢您的帮助

推荐答案

这是我的示例文档 TestDocument.docx 的样子.

Here's what my example document, TestDocument.docx, looks like.

注意:斜体"一词在斜体中，但是强调"使用的样式是强调".

Note: The word "Italic" is in Italics, but "Emphasis" uses the style, Emphasis.

如果安装 python-docx 模块.这是一个相当简单的练习.

If you install the python-docx module. This is a fairly simple exercise.

>>> from docx import Document
>>> document = Document('TestDocument.docx')
>>> for p in document.paragraphs:
...     for run in p.runs:
...             print run.text, run.italic, run.bold
... 
Test Document None None
Italics True None
Emp None None
hasis None None
>>> [[run.text for run in p.runs if run.italic] for p in document.paragraphs]
[[], ['Italics'], []]

Run.italic 属性捕获文本是否设置为斜体格式，但是它不知道文本块是否具有以斜体呈现的样式，但是可以通过检查来检测Run.style.name(如果您知道文档中的样式以斜体显示.

The Run.italic attribute captures whether the text is formatted as Italic, but it doesn't know if a text block has a Style that is rendered in Italic, but it can be detected by checking Run.style.name (if you know what styles in your document are rendered in Italics.

>>> [[run.text for run in p.runs if run.style.name=='Emphasis'] for p in document.paragraphs]
[[], [], ['Emp', 'hasis']]

这篇关于读取docx文件，识别并存储斜体文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

读取docx文件，识别并存储斜体文本 [英] Reading docx files, recognizing and storing italicized text

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

读取docx文件，识别并存储斜体文本 [英] Reading docx files, recognizing and storing italicized text

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭