使用 iText 将 html 中的阿拉伯字符转换为 pdf [英] Arabic caracters in html to pdf using iText
问题描述
我已经在 stackoverflow 上浏览了与阿拉伯语字符相关的所有 iText 主题,但没有找到这个主题的答案.我需要将 html 文件转换为 pdf,但此 html 包含英语和阿拉伯语字符.在Notepadd++或任何浏览器中显示html,没有问题,我可以正确看到阿拉伯字符,但是当我使用以下程序转换为pdf时,我想不出显示阿拉伯字符的方法,我只能得到?"相反:
I've gone through avery iText topic related to arabic caracters on stackoverflow already, but didn't find an answer for this one. I need to convert an html file in to pdf, but this html contains both english and arabic caracters. Displaying the html in Notepadd++ or in any browser, there is no problem, I can see arabic caracters properly, but when I use the following program to convert into pdf, I can't figure out a way to display arabic caracters, I only get "?" instead :
import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.nio.charset.Charset;
import org.apache.commons.io.IOUtils;
import com.itextpdf.text.Document;
import com.itextpdf.text.FontFactory;
import com.itextpdf.text.pdf.BaseFont;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.tool.xml.XMLWorkerHelper;
public class Test2 {
/**
* @param args
*/
public static void main(String[] args) {
try {
FileInputStream in = new FileInputStream(new File(
"C:\\Test\\test_arabic.html"));
String k = IOUtils.toString(in, Charset.forName("UTF-8"));
OutputStream file = new FileOutputStream(new File("C:\\Test\\Test.pdf"));
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, file);
InputStream htmlIn = new ByteArrayInputStream(k.getBytes());
document.open();
XMLWorkerHelper helper = XMLWorkerHelper.getInstance();
FontFactory.getFontImp().registerDirectory("C:\\Windows\\Fonts");
FontFactory.getFontImp().defaultEncoding = BaseFont.IDENTITY_H;
helper.parseXHtml(writer, document, htmlIn, in, Charset.forName("UTF-8"),
FontFactory.getFontImp());
document.close();
file.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
这是我的示例 html 文件:
Here is my sample html file :
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
<meta name="language" content="ar-SA" />
<title>My arabic html</title>
</head>
<body>
<font size="1">
<table width="700" style='font-family:Verdana; font-size:20px; color:blue'>
<tr>
<td align="left">ADVICE</td>
<td dir="rtl" lang="ar-SA"><p align='right' style='font-family:Traditional Arabic;'> إشعار </p></td>
</tr>
</table>
<table width="700" style='font-size:16px; color:white; background-color:gray'>
<tr>
<td align="left">Foreign Exchange</td>
<td dir="rtl" lang="ar-SA"><p align='right' style='font-family:Traditional Arabic;'> تبادل العملات الأجنبية </p></td>
</tr>
</table>
</font>
</body>
</html>
有人知道怎么做吗?我还尝试使用 w3c 文档和 iTextRender 将我的 html 转换为 Bytes 数组,但没有成功.
Does anyone know how to do that ? I also tried converting my html into a Bytes array using w3c document and iTextRender, but no success.
编辑:我现在使用 Vahidn 提供的代码(再次感谢)很少补充,因为我现在仍在为对齐而苦苦挣扎.似乎 align="left" 不适用于阿拉伯语和 runDirection RTL.这是我的示例 html:
Edit : I now use the code provided by Vahidn (thanks a lot again) Little complement because I'm still struggling with the alignment now. It seems that the align="left" does not work with arabic and runDirection RTL. Here is my sample html :
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
<meta name="language" content="ar-SA" />
<title>Confirmation Notice</title>
</head>
<body>
<font size="1">
<table width="700" style="font-family:Verdana; font-size:20px; color:white; background-color:blue">
<tr>
<td width="350" align="right">ADVICE</td>
<td width="350" align="left" dir="rtl" lang="ar-SA">
<p style="font-family:traditional arabic;">
<b>إشعار</b>
</p>
</td>
</tr>
<tr>
<td width="350" align="right">Islamic Return Account</td>
<td width="350" dir="rtl" lang="ar-SA" align="left">
<p style="font-family:traditional arabic;">
<b>حساب العائد الإسلامي</b>
</p>
</td>
</tr>
</table>
</font>
</body>
</html>
但它永远不会在左侧对齐阿拉伯文列.对齐中心虽然有效...有什么想法吗?
But it never aligns on the left the arabic column. align center works though... Any idea ?
非常感谢
感谢您的帮助
推荐答案
我使用 iTextSharp(C# 版本)解决了这个问题.您可以在这里找到它:http://www.dotnettips.info/file/userfile?name=XMLWorkerRTLsample.cs
I solved this issue using iTextSharp (C# version). Here you can find it: http://www.dotnettips.info/file/userfile?name=XMLWorkerRTLsample.cs
所附示例也需要稍作修改:
the attached sample needs a little modification as well:
public void Add(IWritable htmlElement)
{
var writableElement = htmlElement as WritableElement;
if (writableElement == null)
return;
foreach (var element in writableElement.Elements())
{
var div = element as PdfDiv;
if (div != null)
{
foreach (var divChildElement in div.Content)
{
fixNestedTablesRunDirection(divChildElement);
_paragraph.Add(divChildElement);
}
}
else
{
fixNestedTablesRunDirection(element);
_paragraph.Add(element);
}
}
}
这篇关于使用 iText 将 html 中的阿拉伯字符转换为 pdf的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!