Python MS Word [英] Python MS Word

查看:72
本文介绍了Python MS Word的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可能重复:
使用Python读取/编写MS Word文件

Possible Duplicate:
Reading/Writing MS Word files in Python

我正在研究需求管理系统(例如requiste pro-Rational Rose),并且需要在Windows或Apple OS环境下通读MS Word文档以搜索特定标签.是否有任何已知的框架(我找不到)或建议的方法?

I'm looking into a requirements management system (like requiste pro - Rational Rose) - and will need to read through a MS Word doc searching for specific tags - on either a windows or Apple OS environment. Are there any known frameworks for this (I couldn't find any) - or suggested approaches?

只需添加一些说明-这不是一次阅读,我会在每次更新文档时对文档进行审查,并对需求的特定区域执行CRUD.

Just to add some clarification - this would not be a one-time read, I'd review the doc every time there is an update to it and perform a CRUD on the requirement specific areas.

推荐答案

首先,将其从本机Word(.doc)格式中删除.

First, get it out of native Word (.doc) format.

  • 执行另存为XML",并坚持让您的用户使用该文件而不是.doc文件.他们几乎不会注意到差异-除了文件更大以外.

  • Do a "Save As XML" and insist your users work with that file instead of the .doc file. They'll hardly notice the difference -- except that the file is bigger.

使用 lxml

Use lxml or element tree to parse the XML and find the headings, sections, paragraphs and lists.

您还可以在进行分析之前执行另存为HTML".这和XML版本一样有效.但是,HTML版本对用户而言并不那么容易,因此仅在进行分析之前这样做.

You can also do a "Save As HTML" before doing your analysis. This works just as well as the XML version. The HTML version isn't as easy for users, however, so do this prior to your analysis only.

使用 Beautiful Soup 解析HTML并找到标题,部分,段落和列表.

Use Beautiful Soup to parse the HTML and find the headings, sections, paragraphs and lists.

具有解析结构(XML或HTML)后,您可以分析文档以查找特定标签.

Once you have a parse structure (XML or HTML) you can analyze the document looking for specific tags.

这篇关于Python MS Word的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆