在Word文档中查找新页面 [英] Find a new page in a word document

查看:102
本文介绍了在Word文档中查找新页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用python-docx识别新页或某些表示页码的标识符?到目前为止,我没有看过所有文档,也尝试过查找WD_BREAK.PAGE属性,但尚不支持此功能.感谢所有帮助.

How do I identify a new page, or some identifier that denotes a pages number using python-docx? I've looked through the docs to no avail so far and have also tried looking for the WD_BREAK.PAGE attribute but this feature is not yet support. All help is appreciated thanks.

推荐答案

简短的答案是,您无法从.docx文件中可靠地确定软分页符.您可以 识别硬分页符,并且可以能够检测到Word上一次流"文档时分页的位置.

The short answer is that you can't reliably determine soft page breaks from a .docx file. You can identify hard page breaks and you may be able to detect where Word broke pages the last time it "flowed" the document.

Word文档是一个流"文档,这意味着Word的布局引擎将文档的文本流"入页面,直到空间用完,然后创建一个新页面,剩余的文本就流到该页面中. .docx文件中未指定这些软"分页符;它们是由Word在呈现时(用于显示或打印)确定的.这是有道理的,因为例如每当您更改边距时,页面可能会在不同的位置中断.

A Word document is a "flowed" document, meaning that Word's layout engine "flows" the text of the document into a page until it runs out of room, then creates a new page into which it flows the remaining text. These "soft" page breaks are not specified in the .docx file; they are determined by Word at the time of rendering, either for display or printing. This makes sense because whenever you change, for example, the margins, the pages may break at different locations.

这意味着,.docx文件不包含用于标识以下文本应在新页面上流向何处的标记.

An implication of this is that the .docx file does not contain markup identifying where the following text should flow onto a new page.

硬分页符是文档作者明确插入的分页符,导致以下内容流到新页面,而与当前页面是否已满无关.我相信可以在运行中使用break元素来实现这些目标,并且可以将其检测出来.

A hard page break is one explicitly inserted by the document author to cause following content to flow to a new page without regard to whether the current page is full. These are implemnted using a break element, within a run I believe, and can be detected.

Word可以插入<w:lastRenderedPageBreak>元素作为辅助技术的辅助,例如用于视障者的语音读取器.我对这些信息知之甚少,Word在什么情况下会插入这些信息,但这可能是值得探索的途径.

As an aid to assistive technologies, like a voice reader for the visually impaired, Word may insert <w:lastRenderedPageBreak> elements. I don't know much about these and under what circumstances Word inserts these, but it might be an avenue worth exploring.

这篇关于在Word文档中查找新页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆