在将Html转换为Pdf时显示Unicode字符 [英] Display Unicode characters in converting Html to Pdf

查看:647
本文介绍了在将Html转换为Pdf时显示Unicode字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用itextsharp dll将HTML转换为pdf。

I am using itextsharp dll to convert HTML to pdf.

html有一些unicode字符,如α,β..当我尝试html到pdf, unicode字符不以pdf格式显示。

The html has some unicode characters like α,β .. when i try to convet html to pdf,unicode characters are not shown in pdf.

我的功能: -

Document doc = new Document(PageSize.LETTER);

            using (FileStream fs = new FileStream(Path.Combine("Test.pdf"), FileMode.Create, FileAccess.Write, FileShare.Read))
            {
                PdfWriter.GetInstance(doc, fs);

                doc.Open();

                doc.NewPage();

                string arialuniTff = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts),
                                                  "ARIALUNI.TTF");

                BaseFont bf = BaseFont.CreateFont(arialuniTff, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);

                Font fontNormal = new Font(bf, 12, Font.NORMAL);

                List<IElement> list = HTMLWorker.ParseToList(new StringReader(stringBuilder.ToString()),
                                                             new StyleSheet());
                Paragraph p = new Paragraph {Font = fontNormal};

                foreach (var element in list)
                {
                    p.Add(element);
                    doc.Add(p);
                }

                doc.Close();
            }


推荐答案

iTextSharp有一些你需要照顾的东西。第一个你已经做了,并得到一个字体,支持你的字符。第二件事是,你想实际注册字体与iTextSharp,以便它意识到它。

When dealing with Unicode characters and iTextSharp there's a couple of things you need to take care of. The first one you did already and that's getting a font that supports your characters. The second thing is that you want to actually register the font with iTextSharp so that its aware of it.

//Path to our font
string arialuniTff = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "ARIALUNI.TTF");
//Register the font with iTextSharp
iTextSharp.text.FontFactory.Register(arialuniTff);

现在我们有一个字体,我们需要创建一个 StyleSheet

Now that we have a font we need to create a StyleSheet object that tells iTextSharp when and how to use it.

//Create a new stylesheet
iTextSharp.text.html.simpleparser.StyleSheet ST = new iTextSharp.text.html.simpleparser.StyleSheet();
//Set the default body font to our registered font's internal name
ST.LoadTagStyle(HtmlTags.BODY, HtmlTags.FACE, "Arial Unicode MS");

您还需要做的一个非HTML部分设置一个特殊的 encoding 参数。此编码特定于iTextSharp,在您的情况下,您希望它是 Identity-H 。如果没有设置,则默认为 Cp1252 WINANSI )。

The one non-HTML part that you also need to do is set a special encoding parameter. This encoding is specific to iTextSharp and in your case you want it to be Identity-H. If you don't set this then it default to Cp1252 (WINANSI).

//Set the default encoding to support Unicode characters
ST.LoadTagStyle(HtmlTags.BODY, HtmlTags.ENCODING, BaseFont.IDENTITY_H);

最后,我们需要将样式表传递给 ParseToList method:

Lastly, we need to pass our stylesheet to the ParseToList method:

//Parse our HTML using the stylesheet created above
List<IElement> list = HTMLWorker.ParseToList(new StringReader(stringBuilder.ToString()), ST);

将所有组合在一起,从打开到关闭,您将有:

Putting that all together, from open to close you'd have:

doc.Open();

//Sample HTML
StringBuilder stringBuilder = new StringBuilder();
stringBuilder.Append(@"<p>This is a test: <strong>α,β</strong></p>");

//Path to our font
string arialuniTff = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "ARIALUNI.TTF");
//Register the font with iTextSharp
iTextSharp.text.FontFactory.Register(arialuniTff);

//Create a new stylesheet
iTextSharp.text.html.simpleparser.StyleSheet ST = new iTextSharp.text.html.simpleparser.StyleSheet();
//Set the default body font to our registered font's internal name
ST.LoadTagStyle(HtmlTags.BODY, HtmlTags.FACE, "Arial Unicode MS");
//Set the default encoding to support Unicode characters
ST.LoadTagStyle(HtmlTags.BODY, HtmlTags.ENCODING, BaseFont.IDENTITY_H);

//Parse our HTML using the stylesheet created above
List<IElement> list = HTMLWorker.ParseToList(new StringReader(stringBuilder.ToString()), ST);

//Loop through each element, don't bother wrapping in P tags
foreach (var element in list) {
    doc.Add(element);
}

doc.Close();

EDIT

在您的注释中,显示指定覆盖字体的HTML。 iTextSharp不会系统的字体和其HTML解析器不使用字体后备技术。在HTML / CSS中指定的任何字体都必须手动注册。

In your comment you show HTML that specifies an override font. iTextSharp does not spider the system for fonts and its HTML parser doesn't use font fallback techniques. Any fonts specified in HTML/CSS must be manually registered.

string lucidaTff = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "l_10646.ttf");
iTextSharp.text.FontFactory.Register(lucidaTff);

这篇关于在将Html转换为Pdf时显示Unicode字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆