如何使用 python-docx 从现有的 docx 文件中提取文本 [英] How to extract text from an existing docx file using python-docx

查看：66 发布时间：2021/6/25 19:46:46 python python-2.7 python-3.x python-docx

本文介绍了如何使用 python-docx 从现有的 docx 文件中提取文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用 python-docx 模块(pip install python-docx)但这似乎非常令人困惑，因为在他们使用的 github repo 测试示例中opendocx 函数，但在 readthedocs 中，他们使用 Document班级.甚至他们只展示了如何将文本添加到 docx 文件而不阅读现有文件?

I'm trying to use python-docx module (pip install python-docx) but it seems to be very confusing as in github repo test sample they are using opendocx function but in readthedocs they are using Document class. Even they are only showing how to add text to a docx file not reading existing one?

第一个 (opendocx) 不起作用，可能已弃用.对于第二种情况，我试图使用:

1st one (opendocx) is not working, may be deprecated. For second case I was trying to use:

from docx import Document

document = Document('test_doc.docx')

print document.paragraphs

它返回了一个 <docx.text.Paragraph 对象在 0x... >

然后我做到了:

for p in document.paragraphs:
    print p.text

它返回了所有文本，但几乎没有遗漏什么.控制台上的文本中未显示所有 URL(CTRL+单击转到 URL).

It returned all text but there were few thing missing. All URLs (CTRL+CLICK to go to URL) were not present in text on console.

这是什么问题?为什么缺少网址?

What is the issue? Why URLs are missing?

如何在不迭代循环的情况下获得完整的文本(类似于 open().read())

How could I get complete text without iterating over loop (something like open().read())

如何使用 python-docx 从现有的 docx 文件中提取文本 [英] How to extract text from an existing docx file using python-docx

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何使用 python-docx 从现有的 docx 文件中提取文本 [英] How to extract text from an existing docx file using python-docx

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭