用PDFBOX写阿拉伯语并使用正确的字符表示形式而不分开 [英] Writing Arabic with PDFBOX with correct characters presentation form without being separated

查看:212
本文介绍了用PDFBOX写阿拉伯语并使用正确的字符表示形式而不分开的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用PDFBox Apache生成包含阿拉伯语文本的PDF,但文本生成为单独的字符,因为Apache将给定的阿拉伯字符串解析为一系列通用的官方Unicode字符,这些字符等同于孤立形式的阿拉伯字符。



以下是一个示例:

目标文本以PDF格式写入应该是PDF文件中的预期输出 - >جملةبالعربي< br>
我在PDF文件中得到的结果 - >





我尝试了一些方法,但是这里有一些没用:

1.将字符串转换为比特流并尝试提取正确的值

2.处理字符串UTF-8&& amp; UTF-16并从中提取值



有一些方法似乎非常有希望得到每个字符的值Unicode但是它生成一般的官方Unicode这里是我的意思

  System.out.println(Integer.toHexString((int)(new String(كلمة)。charAt (1)))); 

输出是644但是费用0是预期的输出,因为这个字符从那时起中间我应该得到中间的Unicode费用



所以我想要的是一些生成正确的Unicode的方法而不仅仅是官方的



以下链接中第一个表中的Left列表示一般Unicode



以下有关在 Netbeans IDE中创建和添加库的说明


  1. 导航到工具栏,点击工具

  2. 选择图书馆

  3. 在左下方你会看到新的图书馆按钮创建你的

  4. 导航到您在库列表中创建的库

  5. 单击它并添加类似的jar文件夹

  6. 在类路径中添加icu4j.jar

  7. 在资源中添加icu4j-src.jar

  8. 在Javadoc中添加icu4j-docs.jar

  9. 从以下位置查看已打开的项目非常正确

  10. 展开要使用库的项目

  11. 右键单击lib raries文件夹并选择添加库

  12. 最后选择刚刚创建的库。

现在您已准备好使用该库只需导入您想要的内容

  import com.ibm.icu.What_You_Want_To_Import; 




如何使用库



使用ArabicShaping Class并反转String,我们可以编写正确的附加阿拉伯语 LINE

以下是代码请注意以下代码中的注释

  import com.ibm。 icu.text.ArabicShaping; 
import com.ibm.icu.text.ArabicShapingException;
import java.io.File;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font。*;

公共类Main {
public static void main(String [] args)抛出IOException,ArabicShapingException
{
File f = new File(Arabic Font File of format.ttf);
PDDocument doc = new PDDocument();
PDPage Page = new PDPage();
doc.addPage(页);
PDPageContentStream Writer = new PDPageContentStream(doc,Page);
Writer.beginText();
Writer.setFont(PDType0Font.load(doc,f),20);
Writer.newLineAtOffset(0,700);
//下一行代码中的诀窍但是这里有几个Notes首先是
//我们必须反转字符串,因为PDFBox是从左边写的但是阿拉伯语是RTL语言
/ /输出将是完美的,除了每一行都是合理的左边这不难解决这个
//所以我们必须逐行写阿拉伯字符串到pdf ..这就像这个
String s =جملةبالعربيلتجربةالكلاساللذييساعدعليوصلالحروفبشكلصحيح;
Writer.showText(new StringBuilder(new ArabicShaping(reverseNumbersInString(ArabicShaping.LETTERS_SHAPE).shape(s)))。reverse()。toString());
//注意上一行代码抛出ArabicShapingExcpetion
Writer.endText();
Writer.close();
doc.save(新文件(File_Test.pdf));
doc.close();
}
}

这是输出





我希望我已经完成了所有事情。



更新:反转后请确保再次反转这些数字以获得相同的正确数字

这里有几个可以帮助的功能

  public static boolean isInt(String Input)
{
try {Integer.parseInt(Input); return true ;
catch(NumberFormatException e){return false;}
}
public static String reverseNumbersInString(String Input)
{
char [] Separated = Input.toCharArray (); int i = 0;
String Result =,Hold =;
for(; i< Separated.length; i ++)
{
if(isInt(Separated [i] +)== true)
{
while (i< Separated.length&&(isInt(Separated [i] +)== true ||分隔[i] =='。'||分隔[i] ==' - '))
{
Hold + =分离[i];
i ++;
}
结果+ =反向(保留);
Hold =;
}
else {结果+ =分隔[i];}
}
返回结果;
}


I'm trying to generate a PDF that contains Arabic text using PDFBox Apache but the text is generated as separated characters because Apache parses given Arabic string to a sequence of general 'official' Unicode characters that is equivalent to the isolated form of Arabic characters.

Here is an example:
Target text to Write in PDF "Should be expected output in PDF File" -> جملة بالعربي
What I get in PDF File ->

I tried some methods but it's no use here are some of them:
1. Converting String to Stream of bits and trying to extract right values
2. Treating String a sequence of bytes with UTF-8 && UTF-16 and extracting values from them

There is some approach seems very promising to get the value "Unicode" of each character But it generate general "official Unicode" Here is what I mean

System.out.println( Integer.toHexString( (int)(new String("كلمة").charAt(1))) );  

output is 644 but fee0 was the expected output because this character is in middle from then I should get the middle Unicode fee0

so what I want is some method that generates the correct Unicode not the just the official one

The very Left column in the first table in the following link represents the general Unicode
Arabic Unicode Tables Wikipedia

解决方案

At First I will thank Tilman and M.Prokhorov for showing me the library that made writing Arabic possible using PDFBox Apache.


This Answer will be divided into two Sections:

  1. Downloading the library and installing it
  2. How to use the library


Downloading the library and installing it

We are going to use ICU Library.
ICU stands for International Components for Unicode and it is a mature, widely used set of C/C++ and Java libraries providing Unicode and Globalization support for software applications. ICU is widely portable and gives applications the same results on all platforms and between C/C++ and Java software.

To download the Library go to the downloads page from here.
Choose the latest version of ICU4J as shown in the following image.

You will be transferred to another page and you will find a box with direct links of the needed components .Go ahead and download three Files you will find the highlighted in next image.

  1. icu4j-docs.jar
  2. icu4j-src.jar
  3. icu4j.jar

The following explanation for creating and adding a library in Netbeans IDE

  1. Navigate to the Toolbar and Click tools
  2. Choose Libraries
  3. At the bottom left you will find new Library button Create yours
  4. Navigate to the library that you created in libraries list
  5. Click it and add jar folders like that
  6. Add icu4j.jar in class path
  7. Add icu4j-src.jar in Sources
  8. Add icu4j-docs.jar in Javadoc
  9. View your opened projects from the very right
  10. Expand the project that you want to use the library in
  11. Right Click on the libraries folder and choose add library
  12. Finally choose the library that you had just created.

Now you are ready to use the library just import what you want like that

import com.ibm.icu.What_You_Want_To_Import;


How to use the library

With ArabicShaping Class and reversing the String we can write a correct attached Arabic LINE
Here is the Code Notice the comments in the following code

import com.ibm.icu.text.ArabicShaping;
import com.ibm.icu.text.ArabicShapingException;
import java.io.File;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.*;

public class Main {
    public static void main(String[] args) throws IOException , ArabicShapingException
{
        File f = new File("Arabic Font File of format.ttf");
        PDDocument doc = new PDDocument();
        PDPage Page = new PDPage();
        doc.addPage(Page);
        PDPageContentStream Writer = new PDPageContentStream(doc, Page);
        Writer.beginText();
        Writer.setFont(PDType0Font.load(doc, f), 20);
        Writer.newLineAtOffset(0, 700);
        //The Trick in the next Line of Code But Here is some few Notes first  
        //We have to reverse the string because PDFBox is Writting from the left but Arabic is RTL Language  
        //The output will be perfect except every line will be justified to the left "It's not hard to resolve this"
        // So we have to write arabic string to pdf line by line..It will be like this
        String s ="جملة بالعربي لتجربة الكلاس اللذي يساعد علي وصل الحروف بشكل صحيح";
        Writer.showText(new StringBuilder(new ArabicShaping(reverseNumbersInString(ArabicShaping.LETTERS_SHAPE).shape(s))).reverse().toString());
        // Note the previous line of code throws ArabicShapingExcpetion 
        Writer.endText();
        Writer.close();
        doc.save(new File("File_Test.pdf"));
        doc.close();
    }
}

Here is the output

I hope that I had gone over everything.

Update : After reversing make sure to reverse the numbers again in order to get the same proper number
Here is a couple of functions that could help

public static boolean isInt(String Input)
{
    try{Integer.parseInt(Input);return true;}
    catch(NumberFormatException e){return false;}
}
public static String reverseNumbersInString(String Input)
{
    char[] Separated = Input.toCharArray();int i = 0;
    String Result = "",Hold = "";
    for(;i<Separated.length;i++ )
    {
        if(isInt(Separated[i]+"") == true)
        {
            while(i < Separated.length && (isInt(Separated[i]+"") == true ||  Separated[i] == '.' ||  Separated[i] == '-'))
            {
                Hold += Separated[i];
                i++;
            }
            Result+=reverse(Hold);
            Hold="";
        }
        else{Result+=Separated[i];}
    }
    return Result;
}

这篇关于用PDFBOX写阿拉伯语并使用正确的字符表示形式而不分开的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆