使用OpenXML从HTML文件生成docx文件 [英] Generating docx file from HTML file using OpenXML
问题描述
我正在使用这种方法生成docx
文件:
I'm using this method for generating docx
file:
public static void CreateDocument(string documentFileName, string text)
{
using (WordprocessingDocument wordDoc =
WordprocessingDocument.Create(documentFileName, WordprocessingDocumentType.Document))
{
MainDocumentPart mainPart = wordDoc.AddMainDocumentPart();
string docXml =
@"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>
<w:document xmlns:w=""http://schemas.openxmlformats.org/wordprocessingml/2006/main"">
<w:body><w:p><w:r><w:t>#REPLACE#</w:t></w:r></w:p></w:body>
</w:document>";
docXml = docXml.Replace("#REPLACE#", text);
using (Stream stream = mainPart.GetStream())
{
byte[] buf = (new UTF8Encoding()).GetBytes(docXml);
stream.Write(buf, 0, buf.Length);
}
}
}
它就像一种魅力:
CreateDocument("test.docx", "Hello");
但是,如果我想放置HTML内容而不是Hello
怎么办?例如:
But what if I want to put HTML content instead of Hello
? for example:
CreateDocument("test.docx", @"<html><head></head>
<body>
<h1>Hello</h1>
</body>
</html>");
或者类似这样的东西:
CreateDocument("test.docx", @"Hello<BR>
This is a simple text<BR>
Third paragraph<BR>
Sign
");
两种情况均会为document.xml
创建无效的结构.
任何的想法?如何从HTML内容生成docx文件?
both cases creates an invalid structure for document.xml
.
Any idea? How can I generate a docx file from a HTML content?
推荐答案
您不能仅将HTML内容插入"document.xml",这部分仅需要WordprocessingML内容,因此您必须将该HTML转换为WordprocessingML ,请参阅.
You cannot just insert the HTML content into a "document.xml", this part expects only a WordprocessingML content so you'll have to convert that HTML into WordprocessingML, see this.
您可以使用的另一件事是altChunk元素,通过它您可以将HTML文件放置在DOCX文件中,然后将该HTML内容引用到文档中某个特定位置的
Another thing that you could use is altChunk element, with it you would be able to place a HTML file inside your DOCX file and then reference that HTML content on some specific place inside your document, see this.
最后,使用 GemBox.Document库,您可以完全完成所需的操作,看到以下内容:
Last as an alternative, with GemBox.Document library you could accomplish exactly what you want, see the following:
public static void CreateDocument(string documentFileName, string text)
{
DocumentModel document = new DocumentModel();
document.Content.LoadText(text, LoadOptions.HtmlDefault);
document.Save(documentFileName);
}
或者您实际上可以直接将HTML内容转换为DOCX文件:
Or you could actually straightforwardly convert a HTML content into a DOCX file:
public static void Convert(string documentFileName, string htmlText)
{
HtmlLoadOptions options = LoadOptions.HtmlDefault;
using (var htmlStream = new MemoryStream(options.Encoding.GetBytes(htmlText)))
DocumentModel.Load(htmlStream, options)
.Save(documentFileName);
}
这篇关于使用OpenXML从HTML文件生成docx文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!