如何使用iText和XMLWorker将格式错误的HTML转换为PDF? [英] How do I convert malformed HTML to PDF with iText and XMLWorker?

查看:182
本文介绍了如何使用iText和XMLWorker将格式错误的HTML转换为PDF?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Itext XMLWorkerHelper将HTML(带有外部CSS)转换为PDF,每当XMLWorkerHelper解析格式错误的HTML时,都会遇到运行时异常.例如:

I am trying to convert HTML(with external CSS) into PDF using Itext XMLWorkerHelper, am facing the run-time exception whenever XMLWorkerHelper parses a malformed HTML. For example:

以下html的输入标签未关闭:XMLWorkerHelper无法解析并引发运行时异常.

The html below has input tag not closed : and XMLWorkerHelper cannot parse and throws run-time exception.

如果我尝试使用正确的HTML输入标签,它将正常工作.

if i try with proper HTML input tag enclosed,it works fine.

如何使用Itext将格式错误或复杂的HTML(以及CSS)转换为PDF.

How can i convert malformed or complex HTML (along with css) to PDF using Itext.

下面是我的代码:

var test_html = File.ReadAllText("C:/Desking _ Lender Program - Dealertrack.html");
var test_css = File.ReadAllText("C:/login.css");
using (var msCss = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(test_css)))
                    {
                        using (var msHtml = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(test_html)))
                        {

                            //Parse the HTML
                            try
                            {
                                iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, msHtml, msCss);
                            }
                            catch { }
                        }
                    }

推荐答案

您是否决定使用iText7还是iTextSharp(5.xx)尚不清楚,但这是使用 HtmlAgilityPack 清理格式错误的HTML:

It's a bit unclear whether you've decided to use iText7 or iTextSharp (5.x.x), but here's a simple example of the latter using HtmlAgilityPack to clean up malformed HTML:

var malformedHtml = @"
<h1>Malformed HTML</h1>
<p>A paragraph <b><span>with improperly nested tags</b></span></p><hr>
<table><tr><td>Cell 1, row 1</td><td>Cell 1, row 2";
HtmlDocument h = new HtmlDocument()
{
    OptionFixNestedTags = true, OptionWriteEmptyNodes = true
};
h.LoadHtml(malformedHtml);

string css = @"
h1 { font-size:1.4em; }
hr { margin-top: 4em; margin-bottom: 2em; color: #ddd; }
table { border-collapse: collapse; }
table, td { border: 1px solid black; }
td { padding: 4px; }
span { color: red; }";

using (var stream = new MemoryStream())
{
    using (var document = new Document())
    {
        PdfWriter writer = PdfWriter.GetInstance(document, stream);
        document.Open();
        using (var htmlStream = new MemoryStream(Encoding.UTF8.GetBytes(h.DocumentNode.WriteTo())))
        {
            using (var cssStream = new MemoryStream(Encoding.UTF8.GetBytes(css)))
            {
                XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, htmlStream, cssStream);
            }
        }
    }
    File.WriteAllBytes(OUTPUT, stream.ToArray());
}

PDF输出:

这篇关于如何使用iText和XMLWorker将格式错误的HTML转换为PDF?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆