使用 PDFBOX 以正确的字符呈现形式书写阿拉伯语而无需分隔 [英] Writing Arabic with PDFBOX with correct characters presentation form without being separated
问题描述
我正在尝试使用 PDFBox Apache 生成包含阿拉伯语文本的 PDF,但文本生成为分隔字符,因为 Apache 将给定的阿拉伯语字符串解析为一系列通用的官方"Unicode 字符,相当于阿拉伯字符.
I'm trying to generate a PDF that contains Arabic text using PDFBox Apache but the text is generated as separated characters because Apache parses given Arabic string to a sequence of general 'official' Unicode characters that is equivalent to the isolated form of Arabic characters.
这是一个例子:
要在 PDF 中写入的目标文本应该在 PDF 文件中输出"-> جملة بالعربي
我在 PDF 文件中得到的内容 ->
Here is an example:
Target text to Write in PDF "Should be expected output in PDF File" -> جملة بالعربي
What I get in PDF File ->
我尝试了一些方法,但没有用,以下是其中一些:
1. 将字符串转换为位流并尝试提取正确的值
2. 用 UTF-8 && 处理 String 一个字节序列UTF-16 并从中提取值
I tried some methods but it's no use here are some of them:
1. Converting String to Stream of bits and trying to extract right values
2. Treating String a sequence of bytes with UTF-8 && UTF-16 and extracting values from them
有一些方法似乎非常有希望获得每个字符的值Unicode" 但它会生成通用的官方 Unicode" 这就是我的意思
There is some approach seems very promising to get the value "Unicode" of each character But it generate general "official Unicode" Here is what I mean
System.out.println( Integer.toHexString( (int)(new String("كلمة").charAt(1))) );
输出是 644 但fee0 是预期的输出,因为这个字符在中间,从那时起我应该得到中间的 Unicode fee0
所以我想要的是一些生成正确 Unicode 而不仅仅是官方的方法
so what I want is some method that generates the correct Unicode not the just the official one
以下链接中第一个表中最左边的列代表通用 Unicode
阿拉伯语 Unicode 表维基百科
The very Left column in the first table in the following link represents the general Unicode
Arabic Unicode Tables Wikipedia
推荐答案
这是一个有效的代码.下载示例字体,例如trado.ttf
Here is a code that works. Download a sample font, e.g. trado.ttf
确保 pdfbox-app
和 icu4j
jar 文件在您的类路径中.
Make sure the pdfbox-app
and icu4j
jar files are in your classpath.
import java.io.File;
import java.io.IOException;
import com.ibm.icu.text.ArabicShaping;
import com.ibm.icu.text.ArabicShapingException;
import com.ibm.icu.text.Bidi;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.*;
public class Main {
public static void main(String[] args) throws IOException , ArabicShapingException
{
File f = new File("trado.ttf");
PDDocument doc = new PDDocument();
PDPage Page = new PDPage();
doc.addPage(Page);
PDPageContentStream Writer = new PDPageContentStream(doc, Page);
Writer.beginText();
Writer.setFont(PDType0Font.load(doc, f), 20);
Writer.newLineAtOffset(0, 700);
String s ="جملة بالعربي لتجربة الكلاس اللذي يساعد علي وصل الحروف بشكل صحيح";
Writer.showText(bidiReorder(s));
Writer.endText();
Writer.close();
doc.save(new File("File_Test.pdf"));
doc.close();
}
private static String bidiReorder(String text)
{
try {
Bidi bidi = new Bidi((new ArabicShaping(ArabicShaping.LETTERS_SHAPE)).shape(text), 127);
bidi.setReorderingMode(0);
return bidi.writeReordered(2);
}
catch (ArabicShapingException ase3) {
return text;
}
}
}
这篇关于使用 PDFBOX 以正确的字符呈现形式书写阿拉伯语而无需分隔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!