使用OpenXML从HTML文件生成docx文件 [英] Generating docx file from HTML file using OpenXML

查看:519
本文介绍了使用OpenXML从HTML文件生成docx文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用这种方法生成docx文件:

I'm using this method for generating docx file:

public static void CreateDocument(string documentFileName, string text)
{
    using (WordprocessingDocument wordDoc =
        WordprocessingDocument.Create(documentFileName, WordprocessingDocumentType.Document))
    {
        MainDocumentPart mainPart = wordDoc.AddMainDocumentPart();

        string docXml =
                    @"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>
                 <w:document xmlns:w=""http://schemas.openxmlformats.org/wordprocessingml/2006/main"">
                 <w:body><w:p><w:r><w:t>#REPLACE#</w:t></w:r></w:p></w:body>
                 </w:document>";

        docXml = docXml.Replace("#REPLACE#", text);

        using (Stream stream = mainPart.GetStream())
        {
            byte[] buf = (new UTF8Encoding()).GetBytes(docXml);
            stream.Write(buf, 0, buf.Length);
        }
    }
}

它就像一种魅力:

CreateDocument("test.docx", "Hello");

但是,如果我想放置HTML内容而不是Hello怎么办?例如:

But what if I want to put HTML content instead of Hello? for example:

CreateDocument("test.docx", @"<html><head></head>
                              <body>
                                    <h1>Hello</h1>
                              </body>
                        </html>");

或者类似这样的东西:

CreateDocument("test.docx", @"Hello<BR>
                                    This is a simple text<BR>
                                    Third paragraph<BR>
                                    Sign
                        ");

两种情况均会为document.xml创建无效的结构. 任何的想法?如何从HTML内容生成docx文件?

both cases creates an invalid structure for document.xml. Any idea? How can I generate a docx file from a HTML content?

推荐答案

您不能仅将HTML内容插入"document.xml",这部分仅需要WordprocessingML内容,因此您必须将该HTML转换为WordprocessingML ,请参阅.

You cannot just insert the HTML content into a "document.xml", this part expects only a WordprocessingML content so you'll have to convert that HTML into WordprocessingML, see this.

您可以使用的另一件事是altChunk元素,通过它您可以将HTML文件放置在DOCX文件中,然后将该HTML内容引用到文档中某个特定位置的

Another thing that you could use is altChunk element, with it you would be able to place a HTML file inside your DOCX file and then reference that HTML content on some specific place inside your document, see this.

最后,使用 GemBox.Document库,您可以完全完成所需的操作,看到以下内容:

Last as an alternative, with GemBox.Document library you could accomplish exactly what you want, see the following:

public static void CreateDocument(string documentFileName, string text)
{
    DocumentModel document = new DocumentModel();
    document.Content.LoadText(text, LoadOptions.HtmlDefault);
    document.Save(documentFileName);
}

或者您实际上可以直接将HTML内容转换为DOCX文件:

Or you could actually straightforwardly convert a HTML content into a DOCX file:

public static void Convert(string documentFileName, string htmlText)
{
    HtmlLoadOptions options = LoadOptions.HtmlDefault;
    using (var htmlStream = new MemoryStream(options.Encoding.GetBytes(htmlText)))
        DocumentModel.Load(htmlStream, options)
                     .Save(documentFileName);
}

这篇关于使用OpenXML从HTML文件生成docx文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆