如何在不使用COM的情况下从MS Word文件(.doc)中提取文本和结构 [英] How do i extract text and structure from an ms word file(.doc) without using COM

查看:78
本文介绍了如何在不使用COM的情况下从MS Word文件(.doc)中提取文本和结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望从ms字文件(.doc)中提取文本和结构,而无需使用COM.我希望能够引用该文件并提取文本,而无需打开文件.

I am looking to extract the text and structure from an ms word file(.doc) without using COM. I want to be able to refer to the file and extract the text without even opening the file. Any help would be much appreciated

推荐答案

Word API是基于COM的.因此,如果您的问题是不熟悉COM,则可以使用C#/.NET并通过COM-interop使用它.这样,当您仍在访问COM时,便会在.NET提供的保护/安全保护下进行操作.

另一方面,如果您不想以这种方式使用COM API,则必须读取文件并手动对其进行解析,这肯定是一项巨大的任务,这可能需要一个开发人员花费几个月的时间才能完成.
The Word API is COM based. So if your problem is unfamiliarity with COM, you could use C#/.NET and use it via COM-interop. That way while you are still accessing COM, you do it behind the protection/safety offered by .NET.

On the other hand if you don''t want to use the COM API that way, you''d have to read the file and parse it manually which will certainly be a gigantic task, one that could take a single developer several months to accomplish.


反词似乎很有趣,但是我该如何安装它并将其合并到我的应用程序中?
the antiword seems interesting but how do i go about installing it and incorporating it into my application?


这篇关于如何在不使用COM的情况下从MS Word文件(.doc)中提取文本和结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆