通过ITextSharp从波斯html文件创建pdf [英] Create pdf from persian html file by ITextSharp

查看:105
本文介绍了通过ITextSharp从波斯html文件创建pdf的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好

我使用ITextSharp库将html转换为pdf.
我的用户在她/他的html文件中使用波斯语句子,因此该库无法转换波斯语单词.

为了解决这个问题,从右到左的问题,我使用下面的代码:

Hi all

I use ITextSharp library to convert html to pdf.
My users use persian language sentence in her/his html files, So this library can''t convert persian word.

For resolve this and right to left problem i use bellow code:

Document document = new Document(PageSize.A4, 80, 50, 30, 65);
            PdfWriter.GetInstance(document, new FileStream(strPDFpath, FileMode.Create));
            document.Open();
            ArrayList objects;
            document.NewPage();
            
            var stream = new StreamReader(strHTMLpath, Encoding.Default).ReadToEnd();
            objects = iTextSharp.text.html.simpleparser.
            HTMLWorker.ParseToList(new StreamReader(strHTMLpath, Encoding.UTF8), styles);            
            BaseFont bf = BaseFont.CreateFont("c:\\windows\\fonts\\Tahoma.ttf",
                                            BaseFont.IDENTITY_H, true);
            for (int k = 0; k < objects.Count; k++)
            {
                PdfPTable table = new PdfPTable(1);
                table.RunDirection = PdfWriter.RUN_DIRECTION_RTL;
                var els = (IElement)objects[k];
                foreach (Chunk el in els.Chunks)
                {
                    #region set persian font
                   iTextSharp.text.Font f2 = new iTextSharp.text.Font(bf, el.Font.Size,
                                                    el.Font.Style, el.Font.Color);
                    el.Font = f2;
                    #endregion set persian font
                    #region Set right to left for persian words
                    PdfPCell cell = new PdfPCell(new Phrase(10, el.Content, el.Font));
                    cell.BorderWidth = 0;
                    table.AddCell(cell);
                    #endregion Set right to left for persian words
                }
                //document.Add((IElement)objects[k]);                
                document.Add(table);
            }
            document.Close();
            Response.Write(strPDFpath);
            Response.ClearContent();
            Response.ClearHeaders();
            Response.AddHeader("Content-Disposition", "attachment; filename=" + strPDFpath);
            Response.ContentType = "application/octet-stream";
            Response.WriteFile(strPDFpath);
            Response.Flush();
            Response.Close();
            if (File.Exists(strPDFpath))
            {
                File.Delete(strPDFpath);
            }



我的从右到左并转换波斯语单词的问题已解决,但还有另一个问题.

我的算法无法解析和转换html文件中使用的表格标记的内容.

例如,我在这里放置了一个波斯语内容语言的html文件:



My right to left and convert persian words was resolved, but it have another problem.

My algorithm can''t parse and convert content of table tag that uses in html file.

For example i put here an html file that it''s content language in persian:

<pre lang="xml"><html>
<head>
<meta name="charset" content="utf-8" />
</head>
<body>

<p style="text-align: right;"><span style="font-family: tahoma;">سلام<br />
<br />
نامه شماره 1<br />
<br />
<br />
<table cellspacing="1" cellpadding="1" align="center">
    <tbody>
        <tr>
            <td>شماره شناسنامه SHSH</td>
            <td>نام خانوادگيFamily</td>
            <td>نامName</td>
        </tr>
        <tr>
            <td>123456789</td>
            <td>حيدربزرگHeidarbozorg</td>
            <td>سعيدSaeed</td>
        </tr>
        <tr>
            <td>258</td>
            <td>رضاييRezaee</td>
            <td>عليAli</td>
        </tr>
        <tr>
            <td>654987</td>
            <td>علي مردان خانAliMardanKhan</td>
            <td>رضاReza</td>
        </tr>
    </tbody>
</table>
<br />
<br />
مشخصات بالا را دريافت کردم</span></p>

</body></html>




现在的问题是:如何用波斯语句子解析具有表标签,div和段落标签的html文件,并将其转换为pdf?




Now the question is: How to parse html file that have table tag, div and paragraph tag with persian language sentence, and convert it to pdf?

推荐答案

可以检查的项目很少.

HTML中的字符集是什么?应该是这样的:
There can be few items to check up.

What is the charset in HTML? Should be something like that:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />



匹配此字符集的文本文件BOM不是强制性的,但是您有吗? (您的文本编辑器应具有另存为UTF-8文件",另存为Unicode文件"选项,请参见BOM表的Unicode标准.类System.IO.StreamReader的构造函数的构造函数具有参数detectEncodingFromByteOrderMarks;如果为true,读者将查看文件开头的BOM.

为什么要使用默认编码生成此流?看你的台词:



It''s not mandatory to have a text-file BOM matching this charset, but do you have it? (Your text editor should have options "Save as UTF-8 files", "Save as Unicode files", see Unicode standard for BOMs. The constructor of the class System.IO.StreamReader constructor has a parameter detectEncodingFromByteOrderMarks; if this is true, the reader looks at the BOM at the beginning of the file.

Why do you have this stream with default encoding? Look at your line:

var stream = new StreamReader(strHTMLpath, Encoding.Default).ReadToEnd();



这可能是一个错误.

Unicode和其他大多数语言完全一样,波斯语语言也受Unicode覆盖,处理波斯语通常不会造成任何问题.

—SA



This could be a mistake.

Persian language is covered by Unicode exactly as most other languages, processing Persian usually never cause any problems.

—SA


感谢您的回复
我将代码更改为此:

Thank you for your response
I change my code to this:

<br />
<pre lang="cs">var stream = new StreamReader(strHTMLpath, Encoding.UTF8).ReadToEnd();</pre><br />



并将此标头添加到我的html文件中:



and add this header to my html file:

<br />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><br />



但是不正确:((

我的问题是:表格标记中的数据无法解析并转换为pdf



but it not correct :((

My problem is: Data in the table tag can''t parse and convert to pdf


嗨 我完全有你的问题
如果您的问题解决了,您能帮我吗?
我来自伊朗
Hi I have your problem exactlly
Could you help me if your problem is solved?
I''m from Iran


这篇关于通过ITextSharp从波斯html文件创建pdf的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆