将Word保存到UTF-8编码的HTML [英] Save Word to UTF-8 Encoded HTML

查看:833
本文介绍了将Word保存到UTF-8编码的HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一些C#VSTO代码,该代码读取Microsoft Word文档并将其保存到过滤的HTML"中.当我在通用Word文档上执行此功能时,html文件的输出使用Windows Charset,如以下所示:

I am writing some C# VSTO code that reads a Microsoft Word document and saves it to Filtered HTML. When I perform this function on a generic Word document, the output of the html file uses a Windows Charset as witnessed here:

<meta http-equiv=Content-Type content="text/html; charset=windows-1252">

如果我打开一个文档并转到文件"->选项"->高级"->"Web选项",则可以选择UTF8,生成的经过过滤的html文档输出如下所示:

If I open a document and go to File->Options->Advanced->Web Options, I can choose UTF8, and the resulting filtered html document output looks like this:

<meta http-equiv=Content-Type content="text/html; charset=utf-8">

我想编写将任何 Word文档保存到使用utf-8过滤的html的c#代码.经过研究后,我发现有人说"SaveAs2"功能不起作用(即使Microsoft将其记录为功能).这意味着该代码对我不起作用:

I want to write c# code that saves any Word document to filtered html with utf-8. After doing some research, I found some people saying the "SaveAs2" function does not work (even though Microsoft documents it as a feature). That means, this code does not work for me:

doc.SaveAs2("C:\\Temp\\Test.htm", MsWord.WdSaveFormat.wdFormatFilteredHTML, Encoding: "65001");

(注意:我尝试将65001放在引号中,而没有引号..都不会抛出错误,但是都行不通.)

(note: I tried putting the 65001 in quotes and without quotes.. neither throw errors, but neither works).

接下来,我继续设置文档的网络选项,如下所示:

Next, I moved on to setting the web options for the document like this:

doc = app.Documents.Open("C:\\Temp\\Test.docx");
doc.WebOptions.Encoding = Microsoft.Office.Core.MsoEncoding.msoEncodingUTF8;
doc.SaveAs2(destFile, MsWord.WdSaveFormat.wdFormatFilteredHTML);

据我所知,上面的代码执行与我手动打开文件相同的功能,转到文件->选项...,设置为UTF-8并将文件保存到过滤的html,但输出仍然看起来像这样:

To the best of my knowledge the above code performs the same exact function as my manually opening a file, going to file->options..., setting to UTF-8 and saving the file to filtered html, yet the output still looks like this:

<meta http-equiv=Content-Type content="text/html; charset=windows-1252">

是否有一种方法可以强制Microsoft Word将文件输出到UTF-8,而无需先手动配置文档?

Is there a way to force Microsoft Word to output a file to UTF-8 without having to manually configure the document first?

推荐答案

在撰写本文时,尚不清楚我的特定版本的Microsoft Word(Word Online)还是VSTO模板是否遇到错误,但是我将在这里回答是什么使我成功了.

At the time of this writing, it is unclear whether I have encountered a bug with my specific version of Microsoft Word (Word Online) or the VSTO template, but I will answer what made this work for me here.

如果此代码不起作用:

doc = app.Documents.Open("C:\\Temp\\Test.docx");
doc.WebOptions.Encoding = Microsoft.Office.Core.MsoEncoding.msoEncodingUTF8;
doc.SaveAs2("C:\\Temp\\Test.htm", MsWord.WdSaveFormat.wdFormatFilteredHTML);

然后,更改代码以刷新文档的字段,如下所示:

Then, change the code to refresh the document's fields, like this:

doc = app.Documents.Open("C:\\Temp\\Test.docx");

doc.Fields.Update(); // ** this is the new line of code.

doc.WebOptions.Encoding = Microsoft.Office.Core.MsoEncoding.msoEncodingUTF8;
doc.SaveAs2("C:\\Temp\\Test.htm", MsWord.WdSaveFormat.wdFormatFilteredHTML);

这篇关于将Word保存到UTF-8编码的HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆