页码python-docx [英] Page number python-docx

查看:95
本文介绍了页码python-docx的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试用python创建一个程序,该程序可以在.docx文件中找到特定的单词并返回其出现的页码.到目前为止,在浏览python-docx文档时,我无法找到如何访问页码,甚至无法访问该编号所在的页脚.有没有办法使用python-docx甚至只是python来做到这一点?否则,什么是最好的方法?

I am trying to create a program in python that can find a specific word in a .docx file and return page number that it occurred on. So far, in looking through the python-docx documentation I have been unable to find how do access the page number or even the footer where the number would be located. Is there a way to do this using python-docx or even just python? Or if not, what would be the best way to do this?

推荐答案

简短答案为否,因为分页符是由呈现引擎插入的,而不是由.docx文件本身确定的.

Short answer is no, because the page breaks are inserted by the rendering engine, not determined by the .docx file itself.

但是,某些客户端在保存的XML中放置< w:lastRenderedPageBreak> 元素,以指示上次呈现该页面时该页面在何处中断.

However, certain clients place a <w:lastRenderedPageBreak> element in the saved XML to indicate where they broke the page last time it was rendered.

我不知道这样做是什么(尽管我希望Word本身可以做到)以及它的可靠性如何,但是如果您想在Python中工作,这就是我建议的方向.您可能会使用python-docx来获取所需的lxml元素的引用(例如 w:document/w:body ),然后使用XPath命令或其他方法迭代到特定页面,但是只是想一想就可以了,这将是一些详细的开发.

I don't know which do this (although I expect Word itself does) and how reliable it is, but that's the direction I would recommend if you wanted to work in Python. You could potentially use python-docx to get a reference to the lxml element you want (like w:document/w:body) and then use XPath commands or something to iterate through to a specific page, but just thinking it through a bit it's going to be some detailed development there to get that working.

如果您使用本地Windows MS Office API,则由于它实际上运行Word应用程序,因此您也许可以获得更好的东西.

If you work in the native Windows MS Office API you might be able to get something better since it actually runs the Word application.

如果您要在python-docx中生成文档,则不会放置这些元素,因为它不会尝试渲染文档(也不可能).我们也不太可能在短期内增加对w:lastRenderedPageBreak的支持.我什至不知道会是什么样子.

If you're generating the documents in python-docx, those elements won't be placed because it makes no attempt to render the document (nor is it ever likely to). We're also not likely to add support for w:lastRenderedPageBreak anytime soon; I'm not even quite sure what that would look like.

如果您在"lastRenderedPageBreak"和/或"python-docx分页符"上进行搜索,则会在此处看到其他问题/答案,可能还会提供更多信息.

If you search on 'lastRenderedPageBreak' and/or 'python-docx page break' you'll see other questions/answers here that may give a little more.

这篇关于页码python-docx的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆