如何将.doc或.docx文件转换为.txt [英] How to convert .doc or .docx files to .txt

查看:1071
本文介绍了如何将.doc或.docx文件转换为.txt的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道如何通过Java将Word .doc/.docx文件转换为文本文件.我了解我可以通过Word本身来执行此操作,但我希望能够执行以下操作:

I'm wondering how you can convert Word .doc/.docx files to text files through Java. I understand that there's an option where I can do this through Word itself but I would like to be able to do something like this:

java DocConvert somedocfile.doc converted.txt

谢谢.

推荐答案

如果您对处理Word文档文件的Java库感兴趣,则可以参考以下内容: Apache POI .来自网站的报价:

If you're interested in a Java library that deals with Word document files, you might want to look at e.g. Apache POI. A quote from the website:

我为什么要使用Apache POI?

Apache POI api的主要用途是 用于文本提取应用程序,例如 作为网络蜘蛛,索引构建器和 内容管理系统.

A major use of the Apache POI api is for Text Extraction applications such as web spiders, index builders, and content management systems.


PS :另一方面,如果您只是在寻找转换实用程序,则Stack Overflow可能不是最合适的选择.


P.S.: If, on the other hand, you're simply looking for a conversion utility, Stack Overflow may not be the most appropriate place to ask for this.

:如果您不想使用现有的库,而是自己进行所有艰苦的工作,您将很高兴听到Microsoft已发布了所需的文件格式规范. ( Microsoft Open Specification Promise 列出了可用的规范.只需在google中找到您所需要的任何规范,对您感兴趣.例如,您需要OLE2复合文件格式,Word 97二进制文件格式和Open XML格式.)

If you don't want to use an existing library but do all the hard work yourself, you'll be glad to hear that Microsoft has published the required file format specifications. (The Microsoft Open Specification Promise lists the available specifications. Just google for any of them that you're interested in. In your case, you'd need e.g. the OLE2 Compound File Format, the Word 97 binary file format, and the Open XML formats.)

这篇关于如何将.doc或.docx文件转换为.txt的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆