如何将方程式从docx复制到另一个docx中的特定位置? [英] How can I copy equations from docx to a specific location in another docx?

查看:94
本文介绍了如何将方程式从docx复制到另一个docx中的特定位置?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,我目前正在尝试编写包含docx文件的代码.这些文件可能包含文本,图像,表格或方程式.该代码旨在复制这些对象并将其附加到基本docx.我可以使用docx模块的"add_picture"和"add_paragraph"方法复制和合并文本,图像和表格,但是我无法对单词方程式执行此操作.我决定尝试深入研究docx的xml,然后从那里复制方程式部分.我可以将方程式添加到基本文档中,但是当我继续添加图片,文本和表格时,这些方程式将显示在docx的 end 处.我的问题是:为什么,如果我按希望它们出现的顺序遍历附加对象,并且有一种方法会阻止代码放置docx末尾的方程式.

Hello I am currently trying to write a code that combines docx files. These files may have text, images, tables, or equations. The code aims to copy these objects and append them to a base docx. I am able to copy and merge text, images, and tables by using the docx module's 'add_picture' and 'add_paragraph' methods but I cannot do this for word equations. I decided to try to dig into the xml of the docx and copy the equation section from there. I am able to append equations to my base document but when I continue to append pictures, texts, and tables, these equations show up at the end of the docx. My questions are: why does this occur if I loop through the appended objects in the order I want them to appear and is there is a way to keep the code from putting the equations at the end of the docx.

以下是代码的概述:

  1. 创建基础文档:

  1. create base document:

document = Document('basedoc.docx')

document=Document('basedoc.docx')

对于子文档的每个块项目,我对类型,样式以及是否存在方程式进行分类:

For each block item of the sub-doc I categorize the type, style, and whether an equation is present or not:

如果isinstance(块,段落):

if isinstance(block, Paragraph):

if "r:embed" in block._element.xml:

    append content,style, and equation arrays, content being a drawing/image

elif "m:oMathPara" in block._element.xml:

    append content,style, and equation arrays, content being an equation
    equationXml.append(block._element.xml)

elif 'w:br w:type="page"' in block._element.xml:

    append content,style, and equation arrays, content being a page break

else:

    append content,style, and equation arrays), content being text

其他:

append content,style, and equation arrays, content being a table

  • 一旦有了内容和样式的数组,我就会遍历内容数组并追加表格,图形,分页符和文本.

  • Once I have my arrays of content and style I loop through the content array and append table, drawings, pagebreaks, and texts.

        if equationXml[i]=='0': #the content is either an image, table, text, or page break
            if "Table" in str(contentStyle[i]):
                    insert table and caption
            else:
                if "drawing" in content[i]:
                    insert image and caption
    
                elif "pageBreak" in content[i]:
                    document.add_page_break()
                else:
                    insert text
        else:                        #there is an equation present
          document=EquationInsert.(document,equationXml[i])
    

  • 我的EquationInsert文件具有名为"AddEquation"的功能,在这里我基本上重写了文档对象(其中UpdateableZipFile是我在网上找到的可快速更新zip文件中的文件的代码):

    My EquationInsert file has function called 'AddEquation' where I basically rewrite my document object (where UpdateableZipFile is a code I found online that quickly updates a file in a zip file):

    def AddEquation(self,document,equationContent):
        document.save('temp.docx')
        z = zipfile.ZipFile('temp.docx')
        tree=etree.parse(z.open('word/document.xml'))
        nmspcDict = tree.getroot().iter().next().nsmap
    
        for key in nmspcDict:
            ET.register_namespace(key, nmspcDict[key])
        tree2=etree.ElementTree(etree.fromstring(equationContent))
        xmlRoot2=tree2.getroot()
        xmlRoot=tree.getroot()
        xmlRoot[1].append(xmlRoot2) #note that [1] had to be used bc [0] was a comment. need to see if general case or not
    
    
        tree.write("document.xml",encoding="utf-8", xml_declaration=True, standalone="yes", pretty_print=True)
    
        with UpdateableZipFile.UpdateableZipFile("temp.docx","a") as o:
            o.write("document.xml","word/document.xml")
    
        document = Document('temp.docx') 
        os.remove('document.xml')
        z.close()
        os.remove('temp.docx')
        return document
    

    此代码添加了等式,但是随着主代码继续在子文档项中循环,这些等式只是以某种方式被推到基础文档的末尾.我尝试过从Insert Equation函数返回一个docx,并从中创建一个新文档,但这没有做任何事情.如果有人对如何制作方程式有任何建议,请不要将其移至文件末尾,我们将不胜感激.否则,我将不得不冒险去看看如何将这些方程式转换成图像=/或docx可以处理的东西.我愿意接受解决方案/建议/意见.谢谢!

    This code adds the equation but as the main code continues to loop through sub-doc items, the equations are just pushed to the end of the base document somehow. I've tried returning a docx from the Insert equation function and creating a new document from it but that didn't do anything. If anyone has any advice on how to make the equation not go to the end of the file that would be very appreciated. Otherwise I'll have to venture into seeing how to convert these equations into images =/ or something that docx can handle. I'm open to solutions/suggestions/comments. Thanks!

    推荐答案

    我确定您会在XML中找到答案.您可以使用 opc-diag 方便地浏览.docx包"中的XML部分".

    I'm sure you'll find your answer in the XML. You can conveniently browse an XML "part" in a .docx "package" using opc-diag.

    Word文档中的段落和表格位于 document.xml 部分中,作为< w:body> 元素下的子元素.< w:body> 中的最后一个元素是section元素(.如果您要在该元素之后追加方程式,它们将继续浮动到底部,因为在sectPr元素的上方 中添加了新的段落和表格.

    The paragraphs and tables in a Word document are located in the document.xml part, as child elements under the <w:body> element. The last element in <w:body> is a section element (<w:sectPr> IIRC). If you're appending your equations after that element, they will continue to float to the bottom as new paragraphs and tables are added above that sectPr element.

    我将使用一个尽可能简短的测试文档,并检查您的代码生成的XML,并将其与看起来像您想要的方式(可能是通过Word手动创建)进行比较.那应该可以迅速指出您在代码中遇到的任何元素排序问题.

    I would work with a short-as-possible test document and examine the XML produced by your code, comparing it to one that looks the way you want, perhaps created by hand in Word. That should quickly point up any element sequencing problems you have in your code.

    这篇关于如何将方程式从docx复制到另一个docx中的特定位置?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆