使用 iText 将 html 中的阿拉伯字符转换为 pdf [英] Arabic caracters in html to pdf using iText

查看:24
本文介绍了使用 iText 将 html 中的阿拉伯字符转换为 pdf的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在 stackoverflow 上浏览了与阿拉伯语字符相关的所有 iText 主题,但没有找到这个主题的答案.我需要将 html 文件转换为 pdf,但此 html 包含英语和阿拉伯语字符.在Notepadd++或任何浏览器中显示html,没有问题,我可以正确看到阿拉伯字符,但是当我使用以下程序转换为pdf时,我想不出显示阿拉伯字符的方法,我只能得到?"相反:

I've gone through avery iText topic related to arabic caracters on stackoverflow already, but didn't find an answer for this one. I need to convert an html file in to pdf, but this html contains both english and arabic caracters. Displaying the html in Notepadd++ or in any browser, there is no problem, I can see arabic caracters properly, but when I use the following program to convert into pdf, I can't figure out a way to display arabic caracters, I only get "?" instead :

import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.nio.charset.Charset;
import org.apache.commons.io.IOUtils;
import com.itextpdf.text.Document;
import com.itextpdf.text.FontFactory;
import com.itextpdf.text.pdf.BaseFont;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.tool.xml.XMLWorkerHelper;


public class Test2 {

    /**
     * @param args
     */
    public static void main(String[] args) {
        try {
            FileInputStream in = new FileInputStream(new File(
                    "C:\\Test\\test_arabic.html"));
            String k = IOUtils.toString(in, Charset.forName("UTF-8"));
            OutputStream file = new FileOutputStream(new File("C:\\Test\\Test.pdf"));
            Document document = new Document();
            PdfWriter writer = PdfWriter.getInstance(document, file);
            InputStream htmlIn = new ByteArrayInputStream(k.getBytes());
            document.open();
            XMLWorkerHelper helper = XMLWorkerHelper.getInstance();
            FontFactory.getFontImp().registerDirectory("C:\\Windows\\Fonts");
            FontFactory.getFontImp().defaultEncoding = BaseFont.IDENTITY_H;
            helper.parseXHtml(writer, document, htmlIn, in, Charset.forName("UTF-8"),
                    FontFactory.getFontImp());
            document.close();
            file.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

这是我的示例 html 文件:

Here is my sample html file :

<html>
<head>
  <meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
  <meta name="language" content="ar-SA" />
  <title>My arabic html</title>
</head>

<body>
<font size="1">

<table width="700" style='font-family:Verdana; font-size:20px; color:blue'>
  <tr>
    <td align="left">ADVICE</td>
    <td dir="rtl" lang="ar-SA"><p align='right' style='font-family:Traditional Arabic;'> إشعار </p></td>
  </tr>
</table>

<table width="700" style='font-size:16px; color:white; background-color:gray'>
  <tr>
    <td align="left">Foreign Exchange</td>
    <td dir="rtl" lang="ar-SA"><p align='right' style='font-family:Traditional Arabic;'> تبادل العملات الأجنبية </p></td>
  </tr>
</table>
</font>
</body>
</html>

有人知道怎么做吗?我还尝试使用 w3c 文档和 iTextRender 将我的 html 转换为 Bytes 数组,但没有成功.

Does anyone know how to do that ? I also tried converting my html into a Bytes array using w3c document and iTextRender, but no success.

编辑:我现在使用 Vahidn 提供的代码(再次感谢)很少补充,因为我现在仍在为对齐而苦苦挣扎.似乎 align="left" 不适用于阿拉伯语和 runDirection RTL.这是我的示例 html:

Edit : I now use the code provided by Vahidn (thanks a lot again) Little complement because I'm still struggling with the alignment now. It seems that the align="left" does not work with arabic and runDirection RTL. Here is my sample html :

<html>
   <head>
      <meta http-equiv="content-type" content="text/html; charset=UTF-8" />
      <meta name="language" content="ar-SA" />
      <title>Confirmation Notice</title>
   </head>
   <body>
      <font size="1">
         <table width="700" style="font-family:Verdana; font-size:20px; color:white; background-color:blue">
            <tr>
               <td width="350" align="right">ADVICE</td>
               <td width="350" align="left" dir="rtl" lang="ar-SA">
                  <p style="font-family:traditional arabic;">
                     <b>إشعار</b>
                  </p>
               </td>
            </tr>
            <tr>
               <td width="350" align="right">Islamic Return Account</td>
               <td width="350" dir="rtl" lang="ar-SA" align="left">
                  <p style="font-family:traditional arabic;">
                     <b>حساب العائد الإسلامي</b>
                  </p>
               </td>
            </tr>
         </table>
    </font>
    </body>
    </html>

但它永远不会在左侧对齐阿拉伯文列.对齐中心虽然有效...有什么想法吗?

But it never aligns on the left the arabic column. align center works though... Any idea ?

非常感谢

感谢您的帮助

推荐答案

我使用 iTextSharp(C# 版本)解决了这个问题.您可以在这里找到它:http://www.dotnettips.info/file/userfile?name=XMLWorkerRTLsample.cs

I solved this issue using iTextSharp (C# version). Here you can find it: http://www.dotnettips.info/file/userfile?name=XMLWorkerRTLsample.cs

所附示例也需要稍作修改:

the attached sample needs a little modification as well:

public void Add(IWritable htmlElement)
{
    var writableElement = htmlElement as WritableElement;
    if (writableElement == null)
        return;

    foreach (var element in writableElement.Elements())
    {
        var div = element as PdfDiv;
        if (div != null)
        {
            foreach (var divChildElement in div.Content)
            {
                fixNestedTablesRunDirection(divChildElement);
                _paragraph.Add(divChildElement);
            }
        }
        else
        {
            fixNestedTablesRunDirection(element);
            _paragraph.Add(element);
        }
    }
}

这篇关于使用 iText 将 html 中的阿拉伯字符转换为 pdf的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆