用PDFBOX写阿拉伯字符 [英] Write arabic characters with PDFBOX

查看:128
本文介绍了用PDFBOX写阿拉伯字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


  1. 更新1

我正在尝试使用pdfbox在pdf文档中写一些阿拉伯字符。结果我得到了一些奇怪的角色。您可以在下面找到我用于测试的代码段。请注意,相同的代码用于打印拉丁字符没有任何问题。

  public static void main(String [] args)throws例外{

PDDocument document = new PDDocument();

PDPage page = new PDPage(PDPage.PAGE_SIZE_A4);
document.addPage(page);

PDPageContentStream stream = new PDPageContentStream(document,page,true,true);

//使用unicode字体
PDFont font = PDTrueTypeFont.loadTTF(document,C:/arialuni.ttf);

font.setFontEncoding(new WinAnsiEncoding());

stream.setFont(font,12);
stream.beginText();

stream.moveTextPositionByAmount(40,600);

stream.drawString(سيججسححسيبحسججسيبنمحح);
stream.endText();
stream.close();
document.save(c:\\resultpdf.pdf);
document.close();

}

感谢您的帮助。我尝试从微软网站下载的Unicode字体,但我仍然有相同的结果。


  1. Update 2

通过使用方法'drawUnicodeString'和方法'loadTTF',我得到了和 PDFBOX-1287 。(差异文件附在问题说明中)
我希望补丁将在版本2.0中应用。


  1. Update 1

I'm trying to write some Arabic characters in a pdf document using pdfbox. As a result I get some strange characters. You can find below the code snippet I used for my test. Notice that the same code was used to print Latin characters without any problem.

public static void main(String[] args) throws Exception {

    PDDocument document = new PDDocument();

    PDPage page = new PDPage(PDPage.PAGE_SIZE_A4);
    document.addPage(page);

    PDPageContentStream stream = new PDPageContentStream(document, page,true, true);

    //Use of a unicode font
    PDFont font = PDTrueTypeFont.loadTTF(document,"C:/arialuni.ttf");

    font.setFontEncoding(new WinAnsiEncoding());

    stream.setFont(font, 12);
    stream.beginText();

    stream.moveTextPositionByAmount(40, 600);

    stream.drawString("سي ججس ححسيب حسججسيبنم حح ");
    stream.endText();
    stream.close();
    document.save("c:\\resultpdf.pdf");
    document.close();

}

Thanks for your help. I tried a Unicode font downloaded from Microsoft website ,but I still have the same result.

  1. Update 2

By using the method 'drawUnicodeString' and the mehod 'loadTTF' I got form the PDFBOX-922 I was able to write arabic charactersm but they are disconnected and ordered from left-to-right. Here are the two methods 'drawUnicodeString' and 'loadTTF'

public void drawUnicodeString(String text) throws IOException {
    COSString string = new COSString();
    for (int i = 0; i < text.length(); i++) {
        char c = text.charAt(i);
        string.append(c >> 8);
        string.append(c & 0xff);
    }
    ByteArrayOutputStream buffer = new ByteArrayOutputStream();
    string.writePDF(buffer);
    appendRawCommands(buffer.toByteArray());
    appendRawCommands(32);
    appendRawCommands(getISOBytes("Tj\n"));
}


public static PDType0Font loadTTF(PDDocument doc, InputStream is)
        throws IOException {
    /* Load the font which we will convert to Type0 font. */
    PDTrueTypeFont pdTtf = PDTrueTypeFont.loadTTF(doc, is);

    TrueTypeFont ttf = pdTtf.getTTFFont();
    CMAPEncodingEntry unicodeMap = null;
    for (CMAPEncodingEntry candidate : ttf.getCMAP().getCmaps()) {
        if (candidate.getPlatformId() == CMAPTable.PLATFORM_WINDOWS
                && candidate.getPlatformEncodingId() == CMAPTable.ENCODING_UNICODE) {
            unicodeMap = candidate;
            break;
        }
    }
    if (unicodeMap == null) {
        throw new RuntimeException(
                "To use as CIDFont, the TTF must have a Windows platform Unicode encoding");
    }
    float scaling = 1000f / ttf.getHeader().getUnitsPerEm();

    MyPDCIDFontType2Font pdCidFont2 = new MyPDCIDFontType2Font();
    pdCidFont2.setBaseFont(pdTtf.getBaseFont());
    pdCidFont2.setFontDescriptor((PDFontDescriptorDictionary) pdTtf
            .getFontDescriptor());
    /* Fixme -- should determine the minimum and maximum charcode in the map */
    int[] cid2gid = new int[65536];
    List<Float> widths = new ArrayList<Float>();
    int[] widthValues = ttf.getHorizontalMetrics().getAdvanceWidth();
    for (int i = 0; i < cid2gid.length; i++) {
        int glyph = unicodeMap.getGlyphId(i);
        cid2gid[i] = glyph;
        widths.add((float) i);
        widths.add((float) i);
        widths.add(widthValues[glyph] * scaling);
    }
    pdCidFont2.setCidToGid(cid2gid);
    pdCidFont2.setWidths(widths);
    pdCidFont2.setDefaultWidth(widths.get(0).longValue());

    /* Now construct the type0 font that we actually return */
    myType0Font pdFont0 = new myType0Font();
    pdFont0.setDescendantFont(pdCidFont2);
    pdFont0.setDescendantFonts(new COSObject(pdCidFont2.getCOSObject()));
    pdFont0.setEncoding(COSName.IDENTITY_H);

    pdFont0.setBaseFont(pdTtf.getBaseFont());

    // pdfont0.setToUnicode(COSName.IDENTITY_H); XXX how to express identity
    // mapping as ToUnicode program? */
    return pdFont0;
}

and here are the characters printed :

I don't know why these characters are disconnected

解决方案

Arabic can be written by applying both PDFBOX-922 and PDFBOX-1287 .(the diff files are attached to in issues description) I hope that the patches will be applied in the version 2.0.

这篇关于用PDFBOX写阿拉伯字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆