将ms-word文件转换为文本文件 [英] Convert ms-word file to text file

查看:101
本文介绍了将ms-word文件转换为文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我必须使用VC ++和MFC将MS-word(* .doc/*.docx)文件转换为文本文件(* .txt).

转换时面临的问题是,当我尝试将Word文件转换为文本文件时,文本文件不是可读形式.它以´•IOÃ0?…ïHü‡ÈW"¸p@?5åÀr„ J?qv ?? Ij?/ò¸Û¿gÜ%j«¶)P.'?ç½÷yœ™形式显示文本t?f?N&àQY³ë¬?0Ò?ÊT9û?¼¤w,ÁL!jk".

反向转换,我的意思是从文本到* .doc正常工作.

我检查了字体属性也一样.甚至我将转换后的垃圾文本粘贴到word文件中,但产生的垃圾相同.

如果需要任何进一步的信息,欢迎您.

谢谢

Hi,

I have to convert a MS-word (*.doc/*.docx) file to a text file (*.txt) using a VC++ and MFC.

The problem faced while conversion is that When I try to convert word file to text file the text file is not in the readable form. It shows the text in the form like "´•IOÃ0…ïHü‡ÈW"¸p@5åÀr„JqvIj/ò¸Û¿gÜ%j«¶)P.‘ç½÷yœ™tfºN&àQY"³ë¬Ã0ÒÊT9û¼¤w,Á L!jk ".

Reverse conversion, I mean from text to *.doc is working fine.

I checked the font properties are also same. Even I pasted the converted garbage text to word file, but it produced the same garbage one.

Welcome if any further information required.

Thanks

推荐答案

之所以发生这种情况,是因为两个应用程序(MS-Word和Notepad)处理和操纵数据的方式.
扩展名为".txt"的文件将具有普通的ANSI字符,没有任何格式,其中".doc/.docx"文件可以具有UNICODE字符,更重要的是其他数据(例如图像).
因此,当您尝试显示".txt"文件中不支持的内容时,我们最终会看到这些垃圾字符.
This happens because of the way the two application (MS-Word & Notepad) handles and manipulate the data.
The file with the extension ''.txt'' will have plain ANSI charecters with out any formatting where as ''.doc / .docx'' file can have UNICODE charecters and more importantly other data such as images.
So when you try to show the contents which are not supported in ''.txt'' file we endup in seeing this junk chars.


MS Word将所有类型的信息添加到文档中(例如文档属性,例如日期和作者,MS版权资料等),但过滤掉该部分应该不难.

但是,困难的是MS Word会将正确的附加信息放到文本中,例如格式信息,锚点,特殊字符,图像数据或其他嵌入的元素.除此之外,MS Word文档的确切格式可能会在版本之间发生变化,因此您可能需要弄清楚文档所存储的版本.

说了这么多,付出的努力是不值得的:仅使用MS Word本身使用另存为"将文档存储为另一种格式即可!
MS Word adds all kind of information to the document (e. g. document properties such as date and author, MS copyright stuff, etc.), but it shouldn''t be too hard to filter out that part.

What is difficult however are the additional bits of information MS Word puts right into the text, such as formatting information, anchors, special characters, image data or other embedded elements. In addition to that, the exact format of a MS Word document may change between versions, so you might need to figure out the version the doc was stored with.

All that said, the effort isn''t worth it: just use MS Word itself to store the document in another format, using "Save As"!


这篇关于将ms-word文件转换为文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆