iText7性能问题与iTextSharp相比 [英] iText7 Performance Issue Compared With iTextSharp

查看:2900
本文介绍了iText7性能问题与iTextSharp相比的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经测试了iTextsharp和iText7用于HTML到PDF的转换。基于性能,iTextsharp需要3分钟才能创建10000个PDF。但iText7需要17分钟才能创建10000 PDF。由于iText7是与iTextsharp相比的新版本,因此我决定将iText7用于商业目的。但性能明智的iText7很低。所以请告诉我如何提高iText7中HTML到PDF转换的性能?

I have tested iTextsharp and iText7 for HTML to PDF conversion. Based on the performance iTextsharp is taking 3 minutes for 10000 PDF creation. But iText7 taking 17 minutes for 10000 PDF creation. Since iText7 is new Version compared to iTextsharp,i decided to use iText7 for Commercial Purpose. But Performance wise iText7 is Low.So Please Tell me How to improve performance of HTML to PDF conversion in iText7?

在iText7中测试

  For i As Integer = 0 To 10000 
        HTML = ReadFile '=> Read HTML file from particular location
        'HTML = Replace(HTML) => To Replace the content dynamically
         Dim writer As PdfWriter
          Dim array() As Byte = System.Text.Encoding.ASCII.GetBytes("a")
          writer = New PdfWriter(FileName, New WriterProperties().SetStandardEncryption(array, array, EncryptionConstants.ALLOW_PRINTING,
                            EncryptionConstants.ENCRYPTION_AES_256))
           HtmlConverter.ConvertToPdf(HTML, writer)
    Next

在iTextSharp中测试

Testing In iTextSharp

   Imports iTextSharp.text
Imports iTextSharp.text.pdf
Imports iTextSharp.pdfa
Imports System.IO
Imports iTextSharp.text.html.simpleparser
Imports System.Text
Imports iTextSharp.tool.xml.html
Imports iTextSharp.tool.xml
Imports iTextSharp.tool.xml.pipeline.html

 For i As Integer = 0 To 10000
    HTML = ReadFile '=> Read HTML file from particular location
        'HTML = Replace(HTML) => To Replace the content dynamically
    Dim bPDF As Byte()
        Dim ms As New MemoryStream
        Dim doc As Document
        doc = New Document(PageSize.A4, 25, 25, 25, 25)
        Dim txtReader As New StringReader(Html)   
        Dim oPdfWriter As PdfWriter
        oPdfWriter = PdfWriter.GetInstance(doc, ms)
        oPdfWriter.SetEncryption(iTextSharp.text.pdf.PdfWriter.ENCRYPTION_AES_128, "q", "a", 2)
        Dim htmlWorker As New HTMLWorker(doc)       
        doc.Open()
        htmlWorker.StartDocument()      
        htmlWorker.Parse(txtReader)
        htmlWorker.EndDocument()
        htmlWorker.Close()
        doc.Close()
        bPDF = ms.ToArray()
        Dim FIleName As String = "D:\ItextSharp_" & Now.ToString("ddMMyyyyHHMMssffffff") & ".pdf"
        File.WriteAllBytes(FIleName, bPDF)
Next



Function ReadFile()
        Dim stringReader As String = ""
        Dim objReader As New System.IO.StreamReader("D:\AS1-Revamp\TestHTML\test.html")
        Do While objReader.Peek() <> -1
            stringReader = stringReader & objReader.ReadLine() & vbNewLine
        Loop
        ReadFile = stringReader
End Function

我用过以上代码测试性能... iText7 Tacking将更多时间放在上述Path中的pdf文件与iTextSharp相比。

I used the above Code to test performance...iText7 Tacking More time to place the pdf file in mentioned Path Compared to iTextSharp.

编辑:复制/粘贴HTML其他问题:

copy/paste of the HTML in that other question:

基于路径中的我的问题 iText7性能问题与iTextSharp相比我已经为MR.Amedee Van Gasse发送了HTML文件。所以请告诉我如何提高iText7的性能..

Based on My Question in the path iText7 Performance Issue Compared With iTextSharp I Have Sent HTML File For MR.Amedee Van Gasse. So Please Tell me How to Improve Performance of iText7..

<div id = "headerdiv" style="width:540px; float:left; background:#ededed; padding:30px; overflow:hidden;">
<br>
<br>
<br>
<div>
<img border='0' src='D:\AS1-Revamp\TestHTML\newlog.bmp' width='100' height='40'>
</div>
<p style="color:Red;align=center;" >                         Details</p>
<br>
<br>
<table >
<tr  border='0'>
<td  bgcolor='Green'>
<font size="3" color="white">
SDetails
</font>
</td>
</td>
</tr>
<tr border='0'>
<td>
<div id="dvKYC">
<table  border='1'>

<tr>
<td><#lsName#></td>
<td>No:<#lsno#></td>
</tr> 

<tr  border='1'>
<td width=500><#lsAddess#></td>
<td></td>
</tr>

<tr>
<td><#lsContacts#></td>
<td> </td>
</tr> 
</table>
</div>
</td>
</tr>
</table>

<br>

<div >
<table >
<tr  border='0'>
<td  bgcolor='Green'>
<font size="3" color="white">
Status
</font>
</td>
</td>
</tr>
</table>
<table style="width:100%;">
<tr  bgcolor=gray >
<td style="width:30%;text-align: left; font-weight: bold;">UUH  </td>
<td style="width:20%;text-align: left; font-weight: bold;">PN</td>
<td style="width:20%;text-align: left; font-weight: bold;">KC </td>
<td style="width:20%;text-align: left; font-weight: bold;">CC</td>
</tr>
<tr>
<td  style"width:200px;"><#lsHs#></td>
<td ><#lsPN#></td>
<td><#lsKC#></td>
<td><#lsCC#></td>
</tr>
</table>
 </div>



<div >
<table >
<tr  border='0'>
<td  bgcolor='Green'>
<font size="3" color="white">
STD
</font>
</td>
</td>
</tr>
</table>


 <##TT##>


</div>

我申请以下代码后,两个错误来自ConverterProperties

After i have Applied Following code two Error Comes in ConverterProperties

1.setCreateAcroForm不是iText.Html2pdf.ConverterProperties的成员

1.setCreateAcroForm is not a member of iText.Html2pdf.ConverterProperties

2.setOutlineHandler不是iText.Html2pdf.ConverterProperties的成员

2.setOutlineHandler is not a member of iText.Html2pdf.ConverterProperties

 Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
           Dim converterProperties As ConverterProperties = New ConverterProperties
            With converterProperties
                .SetBaseUri(".")
                .setCreateAcroForm(False)
                .SetCssApplierFactory(New DefaultCssApplierFactory())
                .SetFontProvider(New DefaultFontProvider())
                .SetMediaDeviceDescription(MediaDeviceDescription.CreateDefault())
                .setOutlineHandler(New OutlineHandler())
                .SetTagWorkerFactory(New DefaultTagWorkerFactory())
            End With
Dim HTML = ReadFile("Input_Template")
            For i = 0 To 10000
                LicenseKey.LoadLicenseFile("C:\iText7\itextkey-0.xml")
                Dim PDF = "E:\iText\testpdf " & i & ".pdf"
                Dim m As New MemoryStream
                Dim writer As PdfWriter
                Dim array() As Byte = System.Text.Encoding.ASCII.GetBytes("a")
                writer = New PdfWriter(PDF, New WriterProperties().SetStandardEncryption(array, array, EncryptionConstants.ALLOW_PRINTING,
                                  EncryptionConstants.ENCRYPTION_AES_256))
                HtmlConverter.ConvertToPdf(HTML, writer, converterProperties)
            Next
        End Sub

如果我注释那两行代码并运行我的程序出现错误在转换器代码行中(即HtmlConverter.ConvertToPdf(HTML,writer,converterProperties))

If i Comment That two Lines of code and running my program an Error comes in the line of converter Code i.e(HtmlConverter.ConvertToPdf(HTML, writer, converterProperties))

错误是:Pdf间接对象属于其他PDF文档。复制对象当前的pdf文档。

The Error is:"Pdf indirect object belongs to other PDF document. Copy object to current pdf document."

因为coverterproperties处于循环外,这个错误就出现了。如果我将所有属性放在循环中它工作正常......但这对于性能明智是否正确..?

since coverterproperties is in out of loop this error comes. if i put this all properties within the loop it works fine...but is this correct for performance wise..?

请帮助我解决这三个错误..?

Please Help me for these Three Errors..?

推荐答案

您的问题的答案很简单:在iText Group,我们不断改进iText软件,并且肯定有空间用于改善性能。但是,我们将无法像过时的 HTMLWorker 一样快速地生成pdfHTML插件。原因很简单: HTMLWorker 不支持CSS, HTMLWorker 仅支持少量标签,依此类推... HTMLWorker 非常简单,仅用于简单需求。

The answer to your question is simple: at iText Group, we are constantly improving the iText software, and there is certainly room for improving the performance. However, we won't ever be able to make the pdfHTML add-on as fast as the obsolete HTMLWorker. The reason is simple: HTMLWorker didn't support CSS, HTMLWorker only supported a small selection of tags, and so on... HTMLWorker was very simple and was only to be used for simple needs.

我们创建了pdfHTML附加组件以支持CSS(包括添加页眉,页脚,页码等功能......)。我们支持 HTMLWorker 中不支持的大量HTML标记。我们支持pdfHTML中元素的绝对定位。所有这些功能都需要付出代价。这个成本是CPU。

We have created the pdfHTML add-on to support CSS (including functionality to add headers, footer, page numbers, etc...). We support plenty of HTML tags that weren't supported in HTMLWorker. We support absolute positioning of elements in pdfHTML. All of this functionality comes with a cost. That cost is CPU.

HTMLWorker 的CPU使用率与CPU使用情况进行比较在理智上是不公平的通过pdfHTML。

It is intellectually unfair of you to compare the CPU use by HTMLWorker with the CPU use by pdfHTML.

这就是说:通过使用 ConverterProperties ,您已经可以节省大量时间。现在,您没有提供任何 ConverterProperties 。这意味着iText必须为您正在创建的每个PDF实例化默认属性。如果您要预先创建 ConverterProperties 并重复使用它们,您可以节省大量时间,但您必须了解pdfHTML提供的额外功能附带了CPU中的成本。

This being said: you can already save plenty of time by using ConverterProperties. Right now, you don't provide any ConverterProperties. This means that iText has to instantiate the default properties for every PDF you are creating. If you would create the ConverterProperties up-front, and reuse them, you could already save plenty of time, but you have to understand that the extra functionality provided by pdfHTML comes with a cost in CPU.

这是您创建 ConverterProperties 实例的方式:

This is how you create a ConverterProperties instance:

ConverterProperties converterProperties = new ConverterProperties()
    .setBaseUri(".")
    .setCreateAcroForm(false)
    .setCssApplierFactory(new DefaultCssApplierFactory())
    .setFontProvider(new DefaultFontProvider())
    .setMediaDeviceDescription(MediaDeviceDescription.createDefault())
    .setOutlineHandler(new OutlineHandler())
    .setTagWorkerFactory(new DefaultTagWorkerFactory());

如您所见,我们创建了大量默认对象:默认的CCS Applier工厂,默认字体提供程序,默认媒体描述,默认大纲处理程序和默认标记工作程序工厂。所有这些对象的创建只花费一点点时间,但是当你将这个时间乘以10,000因为你创建10,000个文档时,创建这些默认对象所需的CPU会变得很重要,并且当你转换HTML时会发生什么文件到PDF如下:

As you can see, we create plenty of default objects: the default CCS Applier factory, the default font provider, the default media description, the default outline handler, and the default tag worker factory. The creation of all of these objects costs a tiny little bit of time, but when you multiply that time by 10,000 because you create 10,000 documents, the CPU needed to create those default objects can become significant, and that what happens when you convert an HTML file to PDF like this:

HtmlConverter.convertToPdf(
    new FileInputStream("resources/test.html"),
    new FileOutputStream("results/test.pdf"));

因为你没有添加 ConverterProperties 参数,iText将在内部为您转换的每个文档创建一个 ConverterProperties 的新实例。 ConverterProperties 的所有默认组件都是 null ,这意味着为每个文档创建CSS的新实例需要创建Applier工厂,字体提供商等。

Since you are not adding a ConverterProperties parameter, iText will create a new instance of ConverterProperties internally for every document that you convert. All the default components of the ConverterProperties will be null, which means that for every document you create new instances of the CSS Applier factory, the font provider, etc... need to be created.

如果您创建<$ c,它将为您节省一些时间(但不是那么多) $ c> ConverterProperties 预先(仅一次),以及所有组件。在将HTML转换为PDF时重用该对象非常重要:

It will save you some time (but not that much) if you create a ConverterProperties up-front (only once), as well as all the components. It is then important that you reuse that object when converting HTML to PDF:

HtmlConverter.convertToPdf(
    new FileInputStream("resources/test.html"),
    new FileOutputStream("results/test.pdf"),
    converterProperties);

这篇关于iText7性能问题与iTextSharp相比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆