将Word保存到UTF-8编码的HTML [英] Save Word to UTF-8 Encoded HTML
问题描述
我正在编写一些C#VSTO代码,该代码读取Microsoft Word文档并将其保存到过滤的HTML"中.当我在通用Word文档上执行此功能时,html文件的输出使用Windows Charset,如以下所示:
I am writing some C# VSTO code that reads a Microsoft Word document and saves it to Filtered HTML. When I perform this function on a generic Word document, the output of the html file uses a Windows Charset as witnessed here:
<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
如果我打开一个文档并转到文件"->选项"->高级"->"Web选项",则可以选择UTF8,生成的经过过滤的html文档输出如下所示:
If I open a document and go to File->Options->Advanced->Web Options, I can choose UTF8, and the resulting filtered html document output looks like this:
<meta http-equiv=Content-Type content="text/html; charset=utf-8">
我想编写将任何 Word文档保存到使用utf-8过滤的html的c#代码.经过研究后,我发现有人说"SaveAs2"功能不起作用(即使Microsoft将其记录为功能).这意味着该代码对我不起作用:
I want to write c# code that saves any Word document to filtered html with utf-8. After doing some research, I found some people saying the "SaveAs2" function does not work (even though Microsoft documents it as a feature). That means, this code does not work for me:
doc.SaveAs2("C:\\Temp\\Test.htm", MsWord.WdSaveFormat.wdFormatFilteredHTML, Encoding: "65001");
(注意:我尝试将65001放在引号中,而没有引号..都不会抛出错误,但是都行不通.)
(note: I tried putting the 65001 in quotes and without quotes.. neither throw errors, but neither works).
接下来,我继续设置文档的网络选项,如下所示:
Next, I moved on to setting the web options for the document like this:
doc = app.Documents.Open("C:\\Temp\\Test.docx");
doc.WebOptions.Encoding = Microsoft.Office.Core.MsoEncoding.msoEncodingUTF8;
doc.SaveAs2(destFile, MsWord.WdSaveFormat.wdFormatFilteredHTML);
据我所知,上面的代码执行与我手动打开文件相同的功能,转到文件->选项...,设置为UTF-8并将文件保存到过滤的html,但输出仍然看起来像这样:
To the best of my knowledge the above code performs the same exact function as my manually opening a file, going to file->options..., setting to UTF-8 and saving the file to filtered html, yet the output still looks like this:
<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
是否有一种方法可以强制Microsoft Word将文件输出到UTF-8,而无需先手动配置文档?
Is there a way to force Microsoft Word to output a file to UTF-8 without having to manually configure the document first?
推荐答案
在撰写本文时,尚不清楚我的特定版本的Microsoft Word(Word Online)还是VSTO模板是否遇到错误,但是我将在这里回答是什么使我成功了.
At the time of this writing, it is unclear whether I have encountered a bug with my specific version of Microsoft Word (Word Online) or the VSTO template, but I will answer what made this work for me here.
如果此代码不起作用:
doc = app.Documents.Open("C:\\Temp\\Test.docx");
doc.WebOptions.Encoding = Microsoft.Office.Core.MsoEncoding.msoEncodingUTF8;
doc.SaveAs2("C:\\Temp\\Test.htm", MsWord.WdSaveFormat.wdFormatFilteredHTML);
然后,更改代码以刷新文档的字段,如下所示:
Then, change the code to refresh the document's fields, like this:
doc = app.Documents.Open("C:\\Temp\\Test.docx");
doc.Fields.Update(); // ** this is the new line of code.
doc.WebOptions.Encoding = Microsoft.Office.Core.MsoEncoding.msoEncodingUTF8;
doc.SaveAs2("C:\\Temp\\Test.htm", MsWord.WdSaveFormat.wdFormatFilteredHTML);
这篇关于将Word保存到UTF-8编码的HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!