使用iText的html到pdf的阿拉伯语字符 [英] Arabic caracters in html to pdf using iText
问题描述
我已经浏览了与stackoverflow上的阿拉伯语字符相关的每个iText主题,但没有找到这个问题的答案。
我需要将html文件转换为pdf,但这个html包含英文和阿拉伯文字符。
在Notepadd ++或任何浏览器中显示html,没有问题,我可以正确地看到阿拉伯语字符,但是当我使用以下程序转换为pdf时,我无法想出一种显示阿拉伯语字符的方法,我只得到?相反:
I've gone through avery iText topic related to arabic caracters on stackoverflow already, but didn't find an answer for this one. I need to convert an html file in to pdf, but this html contains both english and arabic caracters. Displaying the html in Notepadd++ or in any browser, there is no problem, I can see arabic caracters properly, but when I use the following program to convert into pdf, I can't figure out a way to display arabic caracters, I only get "?" instead :
import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.nio.charset.Charset;
import org.apache.commons.io.IOUtils;
import com.itextpdf.text.Document;
import com.itextpdf.text.FontFactory;
import com.itextpdf.text.pdf.BaseFont;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.tool.xml.XMLWorkerHelper;
public class Test2 {
/**
* @param args
*/
public static void main(String[] args) {
try {
FileInputStream in = new FileInputStream(new File(
"C:\\Test\\test_arabic.html"));
String k = IOUtils.toString(in, Charset.forName("UTF-8"));
OutputStream file = new FileOutputStream(new File("C:\\Test\\Test.pdf"));
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, file);
InputStream htmlIn = new ByteArrayInputStream(k.getBytes());
document.open();
XMLWorkerHelper helper = XMLWorkerHelper.getInstance();
FontFactory.getFontImp().registerDirectory("C:\\Windows\\Fonts");
FontFactory.getFontImp().defaultEncoding = BaseFont.IDENTITY_H;
helper.parseXHtml(writer, document, htmlIn, in, Charset.forName("UTF-8"),
FontFactory.getFontImp());
document.close();
file.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
这是我的示例html文件:
Here is my sample html file :
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
<meta name="language" content="ar-SA" />
<title>My arabic html</title>
</head>
<body>
<font size="1">
<table width="700" style='font-family:Verdana; font-size:20px; color:blue'>
<tr>
<td align="left">ADVICE</td>
<td dir="rtl" lang="ar-SA"><p align='right' style='font-family:Traditional Arabic;'> إشعار </p></td>
</tr>
</table>
<table width="700" style='font-size:16px; color:white; background-color:gray'>
<tr>
<td align="left">Foreign Exchange</td>
<td dir="rtl" lang="ar-SA"><p align='right' style='font-family:Traditional Arabic;'> تبادل العملات الأجنبية </p></td>
</tr>
</table>
</font>
</body>
</html>
有谁知道怎么做?
我也尝试使用w3c文档和iTextRender将我的html转换为Bytes数组,但没有成功。
Does anyone know how to do that ? I also tried converting my html into a Bytes array using w3c document and iTextRender, but no success.
编辑:我现在使用Vahidn提供的代码(再次感谢)
很少补充,因为我现在仍然在努力对齐。
似乎align =left不适用于arabic和runDirection RTL。
这是我的样本html:
Edit : I now use the code provided by Vahidn (thanks a lot again) Little complement because I'm still struggling with the alignment now. It seems that the align="left" does not work with arabic and runDirection RTL. Here is my sample html :
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
<meta name="language" content="ar-SA" />
<title>Confirmation Notice</title>
</head>
<body>
<font size="1">
<table width="700" style="font-family:Verdana; font-size:20px; color:white; background-color:blue">
<tr>
<td width="350" align="right">ADVICE</td>
<td width="350" align="left" dir="rtl" lang="ar-SA">
<p style="font-family:traditional arabic;">
<b>إشعار</b>
</p>
</td>
</tr>
<tr>
<td width="350" align="right">Islamic Return Account</td>
<td width="350" dir="rtl" lang="ar-SA" align="left">
<p style="font-family:traditional arabic;">
<b>حساب العائد الإسلامي</b>
</p>
</td>
</tr>
</table>
</font>
</body>
</html>
但它从未在左边的阿拉伯列上对齐。对齐中心工作虽然...
任何想法?
But it never aligns on the left the arabic column. align center works though... Any idea ?
非常感谢
谢谢你的支持帮助
推荐答案
我使用iTextSharp(C#版本)解决了这个问题。在这里你可以找到它: http://www.dotnettips.info/file/userfile ?name = XMLWorkerRTLsample.cs
I solved this issue using iTextSharp (C# version). Here you can find it: http://www.dotnettips.info/file/userfile?name=XMLWorkerRTLsample.cs
附加的样本也需要稍加修改:
the attached sample needs a little modification as well:
public void Add(IWritable htmlElement)
{
var writableElement = htmlElement as WritableElement;
if (writableElement == null)
return;
foreach (var element in writableElement.Elements())
{
var div = element as PdfDiv;
if (div != null)
{
foreach (var divChildElement in div.Content)
{
fixNestedTablesRunDirection(divChildElement);
_paragraph.Add(divChildElement);
}
}
else
{
fixNestedTablesRunDirection(element);
_paragraph.Add(element);
}
}
}
这篇关于使用iText的html到pdf的阿拉伯语字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!