如何从段落python docx获取图像(inlineshape) [英] How to get an image (inlineshape) from paragraph python docx

查看:966
本文介绍了如何从段落python docx获取图像(inlineshape)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想逐段阅读docx文档,如果有图片(InlineShape),请使用其周围的文字进行处理.函数Document.inline_shapes将提供文档中所有内联形状的列表.但是我想得到一个,如果存在的话,它恰好出现在当前段落中...

I want to read the docx document paragraph by paragraph and if there is a picture (InlineShape), then process it with the text around it. The function Document.inline_shapes will give the list of all inline shapes in the document. But I want to get the one, that appears exactly in the current paragraph if exists...

代码示例:

from docx import Document

doc = Document("test.docx")
blip = doc.inline_shapes[0]._inline.graphic.graphicData.pic.blipFill.blip
rID = blip.embed
document_part = doc.part
image_part = document_part.related_parts[rID]

fr = open("test.png", "wb")
fr.write(image_part._blob)
fr.close()

(这就是我要保存这些图片的方式)

(this is how I want to save these pictures)

推荐答案

假设您的段落是标准的,则可以使用以下代码查找图像

Assume your paragraph is par, you may use the following code to find the images

import xml.etree.ElementTree as ET
def hasImage(par):
    """get all of the images in a paragraph 
    :param par: a paragraph object from docx
    :return: a list of r:embed 
    """
    ids = []
    root = ET.fromstring(par._p.xml)
    namespace = {
             'a':"http://schemas.openxmlformats.org/drawingml/2006/main", \
             'r':"http://schemas.openxmlformats.org/officeDocument/2006/relationships", \
             'wp':"http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"}

    inlines = root.findall('.//wp:inline',namespace)
    for inline in inlines:
        imgs = inline.findall('.//a:blip', namespace)
        for img in imgs:     
            id = img.attrib['{{{0}}}embed'.format(namespace['r'])]
        ids.append(id)

    return ids

这篇关于如何从段落python docx获取图像(inlineshape)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆