如何使用 Python 从 doc/docx 文件中提取数据 [英] How do I extract data from a doc/docx file using Python

查看:39
本文介绍了如何使用 Python 从 doc/docx 文件中提取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道有类似的问题,但我找不到可以回答我的祈祷的问题.我需要的是一种从 MS-Word 文件访问某些数据并将其保存在 XML 文件中的方法.阅读 python-docx 没有帮助,因为它似乎只允许一个写入word文档,而不是阅读.准确地展示我的任务(或我选择如何完成我的任务):我想在文档中搜索一个关键词或短语(该文档包含表格)并从关键词/短语所在的表格中提取文本数据成立.有人有什么想法吗?

I know there are similar questions out there, but I couldn't find something that would answer my prayers. What I need is a way to access certain data from MS-Word files and save it in an XML file. Reading up on python-docx did not help, as it only seems to allow one to write into word documents, rather than read. To present my task exactly (or how i chose to approach my task): I would like to search for a key word or phrase in the document (the document contains tables) and extract text data from the table where the key word/phrase is found. Anybody have any ideas?

推荐答案

看来 pywin32 可以解决问题.您可以遍历文档中的所有表格以及表格中的所有单元格.获取数据有点棘手(必须省略每个条目的最后 2 个字符),但除此之外,它是一个十分钟的代码.如果有人需要更多详细信息,请在评论中说明.

It seems that pywin32 does the trick. You can iterate through all the tables in a document and through all the cells inside a table. It's a bit tricky to get the data (the last 2 characters from every entry have to be omitted), but otherwise, it's a ten minute code. If anyone needs additional details, please say so in the comments.

这篇关于如何使用 Python 从 doc/docx 文件中提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆