iTextSharp的解析HTML包含图片:这正确解析,但不会显示图像 [英] ITextSharp Parsing HTML with Images in it: It parses correctly but wont show images

查看:329
本文介绍了iTextSharp的解析HTML包含图片:这正确解析,但不会显示图像的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想生成使用该库从iTextSharp的HTML为.pdf。我能够创建转换成PDF文本/段落HTML文本的PDF

我的问题: PDF格式不显示我的图片(我的 IMG 从HTML元素)。我所有的 IMG HTML在我的HTML元素不得到显示在PDF?是否有可能为iTextSharp的解析HTML和放大器;显示图像。我真的希望如此,否则我酿:(

我链接到那里的图像(使用IMG_BASURL)正确的目录,但他们只是没有显示

我的code:

  // mainContents变量是包含我的HTML字符串
VAR文件=新的文件(PageSize.A4,50,50,80,100);
无功输出=新的MemoryStream();
VAR作家= PdfWriter.GetInstance(文件输出);
document.open();Hashtable的提供商=新的Hashtable();
providers.Add(img_baseurl,C:/用户/ XX / VisualStudio中/项目/ MyProject的/);
VAR parsedHtmlElements = HTMLWorker.ParseToList(新StringReader(mainContents),空,供应商);
的foreach(在parsedHtmlElements VAR HTML元素)
   document.Add(HTML元素作为IElement);document.Close();


解决方案

这是我遇到这个问题是该图像是画布太大,每一次。更具体地讲,即使是赤裸裸的 IMG 标签内将得到包裹在,将让包裹在一个段落,我认为像满溢段落,但我不是100%肯定。

这两个简单的修复,其一是扩大画布或指定的HTML IMG 标记的图像尺寸。第三个更复杂的路线将是使用额外的提供者 IMG_PROVIDER 。要做到这一点,你需要实现 IImageProvider 接口。下面是一个非常简单的版本

 公共类ImageThing:IImageProvider {
        //存储到主文档的参考,使我们可以访问页面大小和边距
        私人文件MainDoc;
        //构造
        公共ImageThing(DOC文件){
            this.MainDoc =文档;
        }
        图像IImageProvider.GetImage(的字符串src,IDictionary的<字符串,字符串> ATTRS,ChainedProperties链,IDocListener DOC){
            // prePEND与我们的路径SRC标记。注意,当使用HTMLWorker.IMG_PROVIDER,HTMLWorker.IMG_BASEURL被除非你选择实现它自己的忽视
            SRC = Environment.GetFolderPath(Environment.SpecialFolder.Desktop)+ @\\+ SRC;
            //获取图像。注意,这会尝试下载/复制图像,你真的想完整性检查这里
            图片IMG = Image.GetInstance(SRC);
            //确保我们得到的东西
            如果(IMG == NULL)返回NULL;
            //确定画布的使用面积。注意,这并没有考虑到当前的光标的位置所以这可能会创建一个新的空白页面只是为了形象
            浮usableW = this.MainDoc.PageSize.Width - (this.MainDoc.LeftMargin + this.MainDoc.RightMargin);
            浮usableH = this.MainDoc.PageSize.Height - (this.MainDoc.TopMargin + this.MainDoc.BottomMargin);
            //如果下载的图像是比任何宽度和/或高度更大然后缩水
            如果(img.Width> usableW || img.Height> usableH){
                img.ScaleToFit(usableW,usableH);
            }
            //返回我们的形象
            返回IMG;
        }
    }

要使用此提供程序只需将其添加到提供程序集合像你那样 HTMLWorker.IMG_BASEURL

  providers.Add(HTMLWorker.IMG_PROVIDER,新ImageThing(DOC));

应该注意的是,如果你使用 HTMLWorker.IMG_PROVIDER ,你有责任搞清楚有关图像的一切。上面的code假定所有的图像路径​​需要与一个常量字符串ppended $ P $,你可能会想在更新这个并检查 HTTP 开始。同时,因为我们说,我们要完全处理图像处理流水线提供商 HTMLWorker.IMG_BASEURL 不再需要。

主要code环,现在看起来是这样的:

 字符串的html = @< IMG SRC =无题-1.png/>中;
        字符串OUTPUTFILE = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop),HtmlTest.pdf);
        使用(的FileStream FS =新的FileStream(OUTPUTFILE,FileMode.Create,FileAccess.Write,FileShare.None)){
            使用(DOC文档=新的文件(PageSize.A4,50,50,80,100)){
                使用(PdfWriter作家= PdfWriter.GetInstance(DOC,FS)){
                    doc.Open();
                    使用(StringReader SR =新StringReader(HTML)){
                        System.Collections.Generic.Dictionary<字符串对象>供应商=新System.Collections.Generic.Dictionary<字符串对象>();
                        providers.Add(HTMLWorker.IMG_PROVIDER,新ImageThing(DOC));                        VAR parsedHtmlElements = HTMLWorker.ParseToList(SR,空,供应商);
                        的foreach(在parsedHtmlElements VAR HTML元素){
                            doc.Add(HTML元素作为IElement);
                        }
                    }
                    doc.Close();
                }
            }
        }

最后一件事,请务必注明张贴在这里,当你打靶其中iTextSharp的版本。在code上述目标iTextSharp的5.1.2.0,但我想你可能会使用4.X系列。

I am trying to generate a .pdf from html using the library ITextSharp. I am able to create the pdf with the html text converted to pdf text/paragraphs

My Problem: The pdf does not show my images(my img elements from the html). All my img html elements in my html dont get displayed in the pdf? Is it possible for ITextSharp to parse HTML & display images. I really hope so otherwise I am stuffed :(

I am linking to the correct directory where the images are(using IMG_BASURL) but they are just not showing

My code:

// mainContents variable is a string containing my HTML
var document = new Document(PageSize.A4, 50, 50, 80, 100);
var output = new MemoryStream();
var writer = PdfWriter.GetInstance(document, output);
document.open();

Hashtable providers = new Hashtable();
providers.Add("img_baseurl","C:/users/xx/VisualStudio/Projects/myproject/");
var parsedHtmlElements = HTMLWorker.ParseToList(new StringReader(mainContents), null, providers);
foreach (var htmlElement in parsedHtmlElements)
   document.Add(htmlElement as IElement);

document.Close();

解决方案

Every time that I've encountered this the problem was that the image was too large for the canvas. More specifically, even a naked IMG tag internally will get wrapped in a Chunk that will get wrapped in a Paragraph, and I think that the image is overflowing the Paragraph but I'm not 100% sure.

The two easy fixes are to either enlarge the canvas or to specify image dimensions on the HTML IMG tag. The third more complex route would be to use an additional provider IMG_PROVIDER. To do this you need to implement the IImageProvider interface. Below is a very simple version of one

    public class ImageThing : IImageProvider {
        //Store a reference to the main document so that we can access the page size and margins
        private Document MainDoc;
        //Constructor
        public  ImageThing(Document doc) {
            this.MainDoc = doc;
        }
        Image IImageProvider.GetImage(string src, IDictionary<string, string> attrs, ChainedProperties chain, IDocListener doc) {
            //Prepend the src tag with our path. NOTE, when using HTMLWorker.IMG_PROVIDER, HTMLWorker.IMG_BASEURL gets ignored unless you choose to implement it on your own
            src = Environment.GetFolderPath(Environment.SpecialFolder.Desktop) + @"\" + src;
            //Get the image. NOTE, this will attempt to download/copy the image, you'd really want to sanity check here
            Image img = Image.GetInstance(src);
            //Make sure we got something
            if (img == null) return null;
            //Determine the usable area of the canvas. NOTE, this doesn't take into account the current "cursor" position so this might create a new blank page just for the image
            float usableW = this.MainDoc.PageSize.Width - (this.MainDoc.LeftMargin + this.MainDoc.RightMargin);
            float usableH = this.MainDoc.PageSize.Height - (this.MainDoc.TopMargin + this.MainDoc.BottomMargin);
            //If the downloaded image is bigger than either width and/or height then shrink it
            if (img.Width > usableW || img.Height > usableH) {
                img.ScaleToFit(usableW, usableH);
            }
            //return our image
            return img;
        }
    }

To use this provider just add it to the provider collection like you did with HTMLWorker.IMG_BASEURL:

providers.Add(HTMLWorker.IMG_PROVIDER, new ImageThing(doc));

It should be noted that if you use HTMLWorker.IMG_PROVIDER that you are responsible for figuring out everything about the image. The code above assumes that all image paths need to be prepended with a constant string, you'll probably want to update this and check for HTTP at the start. Also, because we're saying that we want to completely handle the image processing pipeline the provider HTMLWorker.IMG_BASEURL is no longer needed.

The main code loop would now look something like this:

        string html = @"<img src=""Untitled-1.png"" />";
        string outputFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "HtmlTest.pdf");
        using (FileStream fs = new FileStream(outputFile, FileMode.Create, FileAccess.Write, FileShare.None)) {
            using (Document doc = new Document(PageSize.A4, 50, 50, 80, 100)) {
                using (PdfWriter writer = PdfWriter.GetInstance(doc, fs)) {
                    doc.Open();
                    using (StringReader sr = new StringReader(html)) {
                        System.Collections.Generic.Dictionary<string, object> providers = new System.Collections.Generic.Dictionary<string, object>();
                        providers.Add(HTMLWorker.IMG_PROVIDER, new ImageThing(doc));

                        var parsedHtmlElements = HTMLWorker.ParseToList(sr, null,  providers);
                        foreach (var htmlElement in parsedHtmlElements) {
                            doc.Add(htmlElement as IElement);
                        }
                    }
                    doc.Close();
                }
            }
        }

One last thing, make sure to specify which version of iTextSharp you are targetting when posting here. The code above targets iTextSharp 5.1.2.0 but I think you might be using the 4.X series.

这篇关于iTextSharp的解析HTML包含图片:这正确解析,但不会显示图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆