在docx中清除换行 [英] Clear new lines in docx

查看:39
本文介绍了在docx中清除换行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个docx文件,该文件在各节之间包含许多新行,当连续多次出现该行时,我需要清除新行.我使用以下命令解压缩文件:

I've a docx file, this contains a lot of new lines between sections, I need to clear a new line when it appears on more than one occasion consecutively. I unzip the file using:

z = zipfile.ZipFile('File.docx','a')
z.extractall()

在目录中:word,是一个文件document.xml,其中包含所有数据,但是我不知道如何在xml中换行.

Inside of a directory: word, is a file document.xml, this contains all the data, but i don't get how to know in xml where's a new line.

我知道解压缩不是解决方案(我在这里仅用于显示文件在哪里).我想我可以使用:

I Know that extract it is not the solution (I use here just only to show where is the file). I think i can use:

z.write('Document.xml')

有人可以帮助我吗?

推荐答案

tlewis中的代码用于从docx中查找特定文本并将其替换.对于您而言,还有其他事情要做:检测新行,并查看它们是否连续超过两个新行.换句话说,换行符只是一个段落(< w:p> 标记),里面没有任何文本.

The code from tlewis is for finding a particular text from the docx and replace it. In your case, there's something else to do: detect the new lines, and see if they are more than two new lines in a row. In word, a newline is just a paragraph (<w:p> tag) without any text inside.

我添加了一些注释,这些注释将向您展示如何使用该拉链.

I have added some comments that will show you how to use the zip.

import zipfile #Import the zip Module
from lxml import etree #Useful to transform string into xml, and xml into string
templateDocx = zipfile.ZipFile("C:/Template.docx") #Here is the path to the file you want to import
newDocx = zipfile.ZipFile("C:/NewDocument.docx", "a") #This is the name of the outputed file

#Open the document.xml file, the file that contains the content 
with open(templateDocx.extract("word/document.xml", "C:/") as tempXmlFile:
    tempXmlStr = tempXmlFile.read()  


tempXmlXml= etree.fromstring(tempXmlStr)   #Convert the string into XML
############
# Algorithm detailled at the bottom, 
# You have to write here the code to select all <w:p> tags, look if there is a <w:t> tag.
############

tempXmlStr = etree.tostring(tempXmlXml, pretty_print=True) # Convert the changed XML into a string

with open("C:/temp.xml", "w+") as tempXmlFile:
    tempXmlFile.write(tempXmlStr) #Write the changed file

for file in templateDocx.filelist:
    if not file.filename == "word/document.xml":
        newDocx.writestr(file.filename, templateDocx.read(file)) #write all files except the changed ones in the zipArchive

newDocx.write("C:/temp.xml", "word/document.xml") #write the document.xml file

templateDocx.close() #Close both template And new Docx
newDocx.close() # Close

如何编写删除多行新行的算法

这是我创建的示例文档:

How to write the algorithm to remove the multiple new lines

Here is a Sample Doc I have Created:

">

这是document.xml的相应代码:

Here is the corresponding code of document.xml:

 <w:p w:rsidR="006C517B" w:rsidRDefault="00761A87">
         <w:bookmarkStart w:id="0" w:name="_GoBack" />
         <w:bookmarkEnd w:id="0" />
         <w:r>
            <w:t>First Line</w:t>
         </w:r>
      </w:p>
      <w:p w:rsidR="00761A87" w:rsidRDefault="00761A87" />
      <w:p w:rsidR="00761A87" w:rsidRDefault="00761A87">
         <w:proofErr w:type="spellStart" />
         <w:r>
            <w:t>Third</w:t>
         </w:r>
         <w:proofErr w:type="spellEnd" />
         <w:r>
            <w:t xml:space="preserve"> Line</w:t>
         </w:r>
      </w:p>
      <w:p w:rsidR="00761A87" w:rsidRDefault="00761A87" />
      <w:p w:rsidR="00761A87" w:rsidRDefault="00761A87" />
      <w:p w:rsidR="00761A87" w:rsidRDefault="00761A87">
         <w:r>
            <w:t>Six Line</w:t>
         </w:r>
      </w:p>
      <w:p w:rsidR="00761A87" w:rsidRDefault="00761A87" />
      <w:p w:rsidR="00761A87" w:rsidRDefault="00761A87" />
      <w:p w:rsidR="00761A87" w:rsidRDefault="00761A87" />
      <w:p w:rsidR="00761A87" w:rsidRDefault="00761A87">
         <w:proofErr w:type="spellStart" />
         <w:r>
            <w:t>Ten</w:t>
         </w:r>
         <w:proofErr w:type="spellEnd" />
         <w:r>
            <w:t xml:space="preserve"> Line</w:t>
         </w:r>
      </w:p>
      <w:p w:rsidR="00761A87" w:rsidRDefault="00761A87">
         <w:proofErr w:type="spellStart" />
         <w:r>
            <w:t>Eleven</w:t>
         </w:r>
         <w:proofErr w:type="spellEnd" />
         <w:r>
            <w:t xml:space="preserve"> Line</w:t>
         </w:r>
      </w:p>

如您所见,新行是空的< w:p> ,就像这样:

As you can see, a new line is a empty <w:p>, like this one:

<w:p w:rsidR="00761A87" w:rsidRDefault="00761A87" />

要删除多个新行,请检查它们是否为多个空的< w:p> ,然后删除除第一行以外的所有行.

To remove the multiple new Lines, check if they are multiple empty <w:p>, and remove all but the first.

希望有帮助!

这篇关于在docx中清除换行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆