在itextsharp中不接受html标记,并且边界外的文本不被接受 [英] html tag not accepted in itextsharp and text out of borders

查看:202
本文介绍了在itextsharp中不接受html标记,并且边界外的文本不被接受的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用itextsharp创建了一个表,并用我数据库中的数据填充它。
一切都还可以,但有些数据包含 html标签,所以在我的表格中我得到标签而不是格式化的文本,还有一些文字在表格边框之外 / strong>。



<这是一些代码:

  PdfPTable table4 = new PdfPTable(3 ); 
PdfPCell cell8 = new PdfPCell(new Phrase(Protocol,new Font(FontFactory.GetFont(Helvetica,12f,Font.BOLD,new BaseColor(0,0,0)))));
cell8.BackgroundColor = new BaseColor(242,242,242);
table4.AddCell(cell8);
PdfPCell cell9 = new PdfPCell(new Phrase(Port,new Font(FontFactory.GetFont(Helvetica,12f,Font.BOLD,new BaseColor(0,0,0)))));
cell9.BackgroundColor = new BaseColor(242,242,242);
table4.AddCell(cell9);
PdfPCell cell10 = new PdfPCell(new Phrase(Service,new Font(FontFactory.GetFont(Helvetica,12f,Font.BOLD,new BaseColor(0,0,0)))));
cell10.BackgroundColor = new BaseColor(242,242,242);
table4.AddCell(cell10);

foreach(在myprotocol中为int)
{
table4.AddCell(t.Protocol);
table4.AddCell(t.Port.ToString());
table4.AddCell(t.Service);
}
document.Add(table4);


解决方案

手动添加内容时,是否为段落或其他东西,iTextSharp将始终完全按原样插入内容。 这意味着它不会解析HTML。



如果您只想删除HTML标记,那么查看这篇文章并使用RegEx(没有依赖项,但有些边缘案例可能会破坏) )或HtmlAgilityPack(在我看来很多不必要的开销)来删除标签。



如果你想解释标签(例如在<$ c $时加粗)遇到c>< strong> 然后你将不得不查看 HTMLWorker 对象。 这是一篇文章,详细介绍了它。



编辑



以下示例代码尝试溢出表的边界但不是'在我的测试机器上。它创建了4个表行,第3行和第4行有一些错综复杂的尝试来破坏表的边界但不是。 (你会看到复杂的部分,我注入一些返回,制表符和特殊的Unicode空格。)



(这段代码必须完全运行而不是挑选它工作正常,它的目标是iTextSharp 5.1.1.0。)

 使用System; 
使用System.Collections.Generic;使用System.ComponentModel
;
使用System.Windows.Forms;
使用System.IO;
使用iTextSharp.text;
使用iTextSharp.text.pdf;

命名空间WindowsFormsApplication1
{
公共部分类Form1:表格
{
public Form1()
{
InitializeComponent() ;
}

//模仿OP结构的示例对象
public class SampleObject
{
public string Protocol {get;组; }
public int Port {get;组; }
public string Service {get;组; }
}
private void Form1_Load(object sender,EventArgs e)
{
//创建一些模拟OP数据结构的样本数据并包含一些可能的边缘情况(但是不要)导致事情爆炸
List< SampleObject> myprotocol = new List< SampleObject>();
//一般文本
myprotocol.Add(new SampleObject {Protocol =Short text,Port = 80,Service =This is a test});
//长文本w / HTML
myprotocol.Add(new SampleObject {Protocol =Long HTML text,Port = 81,Service = string.Format(< p> {0} {0 }< p> {1}将数据库服务器配置为仅允许访问受信任的系统。{0} {1}例如,PCI DSS标准要求您将数据库放在{0} {1}内部网络区域中,与DMZ隔离。{0}< / p>,\\ n,\ t)});
//长文本w /空格替换为Unicode FEFF,这是一个零宽度的非破坏空间
myprotocol.Add(new SampleObject {Protocol =具有零宽度不间断空格的长HTML文本 ,Port = 82,Service = string.Format(< p> {0} {0}< p> {1}将数据库服务器配置为仅允许访问受信任的系统。{0} {1}例如, PCI DSS标准要求您将数据库放在{0} {1}内部网络区域中,与DMZ隔离。{0}< / p>,\\ n,\ t ).Replace(,\ uFEFF)});
// Unicode 0020的长文本/ sapces reaplces是一个常规的不间断空格
myprotocol.Add(new SampleObject {Protocol =具有不间断空格的长HTML文本,Port = 83 ,Service = string.Format(< p> {0} {0}< p> {1}将数据库服务器配置为仅允许访问可信系统。{0} {1}例如,PCI DSS标准要求您将数据库放在{0} {1}内部网络区域中,与DMZ隔离。{0}< / p>,\\ n,\ t)。 (,\ u0020)});

使用(iTextSharp.text.Document Doc = new iTextSharp.text.Document(PageSize.LETTER))
{
using(FileStream FS = new FileStream(Path.Combine( Environment.GetFolderPath(Environment.SpecialFolder.Desktop),TableTest.pdf),FileMode.Create,FileAccess.Write,FileShare.Read))
{
using(PdfWriter writer = PdfWriter.GetInstance(Doc ,FS))
{
Doc.Open();

Doc.NewPage();

PdfPTable table4 = new PdfPTable(3);
table4.SetWidths(new float [] {0.9f,1f,1.2f});

PdfPCell cell8 = new PdfPCell(new Phrase(Protocol,new iTextSharp.text.Font(FontFactory.GetFont(Helvetica,12.0f,iTextSharp.text.Font.BOLD,new BaseColor( 0,0,0)))));
cell8.BackgroundColor = new BaseColor(242,242,242);
table4.AddCell(cell8);

PdfPCell cell9 = new PdfPCell(new Phrase(Port,new iTextSharp.text.Font(FontFactory.GetFont(Helvetica,12f,iTextSharp.text.Font.BOLD,new BaseColor(0) ,0,0)))));
cell9.BackgroundColor = new BaseColor(242,242,242);
table4.AddCell(cell9);

PdfPCell cell10 = new PdfPCell(new Phrase(Service,new iTextSharp.text.Font(FontFactory.GetFont(Helvetica,12f,iTextSharp.text.Font.BOLD,new BaseColor(0) ,0,0)))));
cell10.BackgroundColor = new BaseColor(242,242,242);
table4.AddCell(cell10);

foreach(myprotocol中的SampleObject t)
{
table4.AddCell(t.Protocol);
table4.AddCell(t.Port.ToString());
table4.AddCell(t.Service);
}

Doc.Add(table4);

Doc.Close();
}
}
}

this.Close();
}
}
}


I created a table with itextsharp and filled it with data from my database. Everything is ok but some data contains html tags, so in my table I get the tags instead of the text formatted, also some of the text gets outside of the table border.

Here is some code:

PdfPTable table4 = new PdfPTable(3);
                PdfPCell cell8 = new PdfPCell(new Phrase("Protocol", new Font(FontFactory.GetFont("Helvetica", 12f, Font.BOLD, new BaseColor(0, 0, 0)))));
                cell8.BackgroundColor = new BaseColor(242, 242, 242);
                table4.AddCell(cell8);
                PdfPCell cell9 = new PdfPCell(new Phrase("Port", new Font(FontFactory.GetFont("Helvetica", 12f, Font.BOLD, new BaseColor(0, 0, 0)))));
                cell9.BackgroundColor = new BaseColor(242, 242, 242);
                table4.AddCell(cell9);
                PdfPCell cell10 = new PdfPCell(new Phrase("Service", new Font(FontFactory.GetFont("Helvetica", 12f, Font.BOLD, new BaseColor(0, 0, 0)))));
                cell10.BackgroundColor = new BaseColor(242, 242, 242);
                table4.AddCell(cell10);

                foreach (int t in myprotocol)
                {
                    table4.AddCell(t.Protocol);
                    table4.AddCell(t.Port.ToString());
                    table4.AddCell(t.Service);
                }
                document.Add(table4);

解决方案

When you manually add content, whether its a Table, a Paragraph, a Chunk or something else, iTextSharp will always insert the content exactly as it. This means that it does not parse HTML.

If all you want to do is strip out the HTML tags then see this post and either use a RegEx (no dependencies but a few edge cases could break) or the HtmlAgilityPack (in my opinion a lot of unnecessary overhead) to remove the tags.

If you want to interpret the tags (for instance bolding when <strong> is encountered) then you're going to have to look at the HTMLWorker object. Here's a post that goes into a little detail on it.

EDIT

Below is sample code that tries to overflow the table's boundaries but doesn't on my test machine. It creates 4 table rows, the 3rd and 4th of which have some convoluted attempts at breaking the table's boundaries but don't. (You'll see the convoluted part where I inject some returns, tabs and special Unicode spaces.)

(This code must be run completely and not cherry picked for it to work correctly and it targets iTextSharp 5.1.1.0.)

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Windows.Forms;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;

namespace WindowsFormsApplication1
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        //Sample object that mimic's the OPs structure
        public class SampleObject
        {
            public string Protocol { get; set; }
            public int Port { get; set; }
            public string Service { get; set; }
        }
        private void Form1_Load(object sender, EventArgs e)
        {
            //Create some sample data the mimics the OP's data structure and include some edge cases that could (but don't) cause things to blow up
            List<SampleObject> myprotocol = new List<SampleObject>();
            //General text
            myprotocol.Add(new SampleObject { Protocol = "Short text", Port = 80, Service = "This is a test" });
            //Long text w/ HTML
            myprotocol.Add(new SampleObject { Protocol = "Long HTML text", Port = 81, Service = string.Format("<p>{0}{0}<p>{1}Configure the database server to only allow acces to trusted systems.{0}{1}For Example, the PCI DSS standard requires you the place the database in an{0}{1}internal network zone, segregated from the DMZ.{0}</p>", "\r\n", "\t") });
            //Long text w/ spaces replaced by Unicode FEFF which is a zero-width non-breaking space
            myprotocol.Add(new SampleObject { Protocol = "Long HTML text with zero width no-break space", Port = 82, Service = string.Format("<p>{0}{0}<p>{1}Configure the database server to only allow acces to trusted systems.{0}{1}For Example, the PCI DSS standard requires you the place the database in an{0}{1}internal network zone, segregated from the DMZ.{0}</p>", "\r\n", "\t").Replace(" ", "\uFEFF") });
            //Long text w/ sapces reaplces by Unicode 0020 which is a regular non-breaking space
            myprotocol.Add(new SampleObject { Protocol = "Long HTML text with non-breaking space", Port = 83, Service = string.Format("<p>{0}{0}<p>{1}Configure the database server to only allow acces to trusted systems.{0}{1}For Example, the PCI DSS standard requires you the place the database in an{0}{1}internal network zone, segregated from the DMZ.{0}</p>", "\r\n", "\t").Replace(" ", "\u0020") });

            using (iTextSharp.text.Document Doc = new iTextSharp.text.Document(PageSize.LETTER))
            {
                using (FileStream FS = new FileStream(Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "TableTest.pdf"), FileMode.Create, FileAccess.Write, FileShare.Read))
                {
                    using (PdfWriter writer = PdfWriter.GetInstance(Doc, FS))
                    {
                        Doc.Open();

                        Doc.NewPage();

                        PdfPTable table4 = new PdfPTable(3);
                        table4.SetWidths(new float[] { 0.9f, 1f, 1.2f });

                        PdfPCell cell8 = new PdfPCell(new Phrase("Protocol", new iTextSharp.text.Font(FontFactory.GetFont("Helvetica", 12.0f, iTextSharp.text.Font.BOLD, new BaseColor(0, 0, 0)))));
                        cell8.BackgroundColor = new BaseColor(242, 242, 242);
                        table4.AddCell(cell8);

                        PdfPCell cell9 = new PdfPCell(new Phrase("Port", new iTextSharp.text.Font(FontFactory.GetFont("Helvetica", 12f, iTextSharp.text.Font.BOLD, new BaseColor(0, 0, 0)))));
                        cell9.BackgroundColor = new BaseColor(242, 242, 242);
                        table4.AddCell(cell9);

                        PdfPCell cell10 = new PdfPCell(new Phrase("Service", new iTextSharp.text.Font(FontFactory.GetFont("Helvetica", 12f, iTextSharp.text.Font.BOLD, new BaseColor(0, 0, 0)))));
                        cell10.BackgroundColor = new BaseColor(242, 242, 242);
                        table4.AddCell(cell10);

                        foreach (SampleObject t in myprotocol)
                        {
                            table4.AddCell(t.Protocol);
                            table4.AddCell(t.Port.ToString());
                            table4.AddCell(t.Service);
                        }

                        Doc.Add(table4);

                        Doc.Close();
                    }
                }
            }

            this.Close();
        }
    }
}

这篇关于在itextsharp中不接受html标记,并且边界外的文本不被接受的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆