哪个 PDF Generation API (Java) 支持 Gujarati 字体? [英] Which PDF Generation API (Java) supports Gujarati Font?

查看:16
本文介绍了哪个 PDF Generation API (Java) 支持 Gujarati 字体?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试过 iText、PDFBox 和甲骨文表格.在 iText 的情况下,我也成功生成了古吉拉特语 PDF 文档.但是,不幸的是它没有在古吉拉特语 (UTF-8) 语言中生成正确的字体.

我在 jdk 1.4 & 中有我的项目这是必须使用的.所以,我需要支持 Gujarati 字体的旧版 API.

请建议是否有任何可用的选项.

示例代码:

public void GeneratePDFusingiText(String lStrGujaratidata){尝试{BaseFont bf = BaseFont.createFont("C:\\Windows\\Fonts\\Shruti.ttf", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);Font font = new Font(bf, 12);文档文档 = 新文档();PdfWriter.getInstance(document, new FileOutputStream("D:/GeneratePDFusingiText.pdf"));文档.open();document.add(new Paragraph(lStrGujaratidata, font));文档.close();}捕获(例外 e){System.out.println("生成PDF时出现异常");e.printStackTrace();}}

编辑 1:

也许图像没有显示出来.它在.也许您可以自己创建一个类似的?(!)

之前好像出现过这种情况:

原答案

Kem chho,

我相信 iText 显示了正确的字符,但是在您将字符串转换为 unicode 点之前,您输入的前 2 个字符已被翻转".所以,问题发生在数据甚至到达 iText 之前.

潜在的问题是第一个"字符是一个前基"字符,它是一种 变音符号.它有点像欧洲文本中的口音",它不能单独存在,其目的是修饰另一个字符.在这种情况下,它将Ba"(બ)变成Bi".

您将在 Unicode 代码页中看到,第一个字符 (િ) 确实是代码点 \u0ABF,第二个 (બ) 是 \u0AAC : http://en.wikipedia.org/wiki/Gujar%C4%81ti_script#Unicode

因此,在 Google Transliterate 和您的代码点表示之间的某个地方,这些字符被翻转了.所以,你需要回顾一下你是如何进行翻译的.

您是如何将这些字符转换为代码点的?

表面上,一些口译员将前基"放在主辅音之后,而不是之前:

  • 请注意,当您将这些字符粘贴到 (Linux) 终端时,前 2 个字符从后到前出现.我相信某事类似的事情也发生在你身上.
  • 您还会注意到,当您尝试在谷歌音译中编辑这个词,你不能将光标放在前 2 个字符,当你按下退格键时,左边的删除右边之前的字符.

因此,如果您能找出这种翻转"发生的位置,那么希望您的解决方案会出现.

希望能帮到你

I have tried iText, PDFBox & Oracle Forms. And I also succed in case of iText to generate Gujarati PDF Document. But, unfortunately it is not generating proper Font in Gujarati (UTF-8) language.

I have my project in jdk 1.4 & that is mandatory to use. So, I need older version of API that support Gujarati Font.

Please suggest if any option is available.

Sample Code:

public void GeneratePDFusingiText(String lStrGujaratidata)
  {
    try
    {

      BaseFont bf = BaseFont.createFont("C:\\Windows\\Fonts\\Shruti.ttf",  BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
      Font font = new Font(bf, 12);
      Document document = new Document();
      PdfWriter.getInstance(document, new FileOutputStream("D:/GeneratePDFusingiText.pdf"));
      document.open();
      document.add(new Paragraph(lStrGujaratidata, font));
      document.close();
    }
    catch(Exception e)
    {
      System.out.println("Exception while generating PDF");
      e.printStackTrace();
    }
   } 

EDIT 1:

Perhaps the image is not getting displayed. It is uploaded here.

EDIT 2:

Step-1) I type a gujarati String Google Transliterate.

Step-2) I convert it into unicode using BableMap Software to use it using Resourse Bundle.

Issue: Let me have a String: બિલાડી (Biladi)

It's unicode will be : \u0AAC \u0ABF\u0AB2\u0ABE\u0AA1\u0AC0

Check the Bold Unicode character above. That is where I am getting the problem. Now if I change this unicode to \u0ABF\u0AAC\u0AB2\u0ABE\u0AA1\u0AC0 , it prints proper output in PDF.

At the same time it prints wrong output in HTML i.e. : િબલાડી

I have to manage in between them.

I have tried using "gu" & "gu.UTF-8" & "UTF-8". But, everytime I am getting same output.

解决方案

Updated Answer

After your comment I realised that I was wrong, i.e. the diacritic character should appear second in the byte sequence, even though it should be rendered left of the main character.

So, it turns out, iText doesn't support this type of rendering on Indic charactersets. Roughly speaking, iText uses awt's Graphics2D to render non-Latin unicode characters, one-by-one, as images in the PDF. (I guess this is because appropriate fonts are not necessarily be installed on everyone's computer). This feature doesn't take this special ordering into account.

iText does support similar behaviour for Arabic, using a class contributed by another developer. See com.itextpdf.text.pdf.ArabicLigaturizer. Perhaps you could create a similar one yourself? (!)

It looks like this has come up before:

Original Answer

Kem chho,

I believe that iText is displaying the correct characters, but the first 2 characters of your input have been 'flipped' before you translated the string into unicode points. So, the problem occurred before the data even gets to iText.

The underlying issue is that the 'first' character is a 'pre-base' character, which is a type of Diacritic. It's a bit like an 'accent' in European texts, in that it can't exist on its own, and its purpose is to embellish another character. In this case it turns a 'Ba' (બ) into a 'Bi'.

You'll see int the the Unicode Codepage, that the first character (િ) is indeed codepoint \u0ABF, and the second (બ) is \u0AAC : http://en.wikipedia.org/wiki/Gujar%C4%81ti_script#Unicode

So, somewhere between Google Transliterate and your codepoint representation, these characters got flipped. So, you need to review how you did that translation.

How did you convert these characters into codepoints?

Seemingly, some interpreters place the 'pre-base' after the main consonant, instead of before it:

  • Note that when you paste those characters into a (Linux) terminal, the first 2 characters come out back-to-front. I believe something similar happened for you too.
  • You'll also notice that when you try editing this word in Google Transliterate, you can't place the cursor between the first 2 characters, and when you hit backspace, the left character is deleted before the right.

So, if you can work out where this 'flipping' occured, then hopefully your solution will present itself.

Hope this helps

这篇关于哪个 PDF Generation API (Java) 支持 Gujarati 字体?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆