使用 Python 从 word 文档中提取图像 [英] Extract images from word document using Python

查看：63 发布时间：2021/6/26 19:32:06 python python-3.x python-2.7

本文介绍了使用 Python 从 word 文档中提取图像的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如何使用 python 从 word 文档中提取图像/徽标并将它们存储在文件夹中.以下代码将 docx 转换为 html，但不会从 html 中提取图像.任何指针/建议都会有很大帮助.

How can i extract images/logo from word document using python and store them in a folder. Following code converts docx to html but it doesn't extract images from the html. Any pointer/suggestion will be of great help.

    profile_path = <file path>
    result=mammoth.convert_to_html( profile_path)
    f = open(profile_path, 'rb')
    b = open(profile_html, 'wb')
    document = mammoth.convert_to_html(f)
    b.write(document.value.encode('utf8'))
    f.close()
    b.close()

推荐答案

您可以使用 docx2txt 库，它将读取您的 .docx 文档并将图像导出到您指定的目录(必须存在).

You can use the docx2txt library, it will read your .docx document and export images to a directory you specify (must exist).

!pip install docx2txt
import docx2txt
text = docx2txt.process("/path/your_word_doc.docx", '/home/example/img/')

执行后，您将在 /home/example/img/ 中拥有图像，而变量 text 将拥有文档文本.它们将按出现顺序命名为 image1.png ... imageN.png.

After execution you will have the images in /home/example/img/ and the variable text will have the document text. They would be named image1.png ... imageN.png in order of appearance.

注意:Word 文档必须为 .docx 格式.

Note: Word document must be in .docx format.

这篇关于使用 Python 从 word 文档中提取图像的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用 Python 从 word 文档中提取图像 [英] Extract images from word document using Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用 Python 从 word 文档中提取图像 [英] Extract images from word document using Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭