pdfbox:...在此字体的编码中不可用 [英] pdfbox: ... is not available in this font's encoding

查看:184
本文介绍了pdfbox:...在此字体的编码中不可用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了pdfbox 2.0.2的问题,该问题由以前阅读的文档的元素(

I'm having problems with pdfbox 2.0.2 writing a pdf document from elements of a previously read document (https://www.dropbox.com/s/ttxiv0dq3abh5kj/Test.pdf?dl=0). Everything works fine, except when I call showText on a PDPageContentStream where I previously set the font with out.setFont(textState.getFont(), textState.getFontSize()) (see the INFORMATION log) and the font is ComicSansMS or ArialBlack. textState is (a clone from) the state from the previously read document. Writing text with Helvetica or Times-Roman works fine.

INFORMATION: set font PDTrueTypeFont RXNQOL+ComicSansMS,Bold/18.0 embedded    
SEVERE: error writing <w>U+0077 is not available in this font's encoding: built-in (TTF)

我想这个问题可能是由字体名称中的连字符或空白引起的,但不知道如何解决此问题.

I suppose the problem may be caused by a missing hyphen or blank in the font name but have no clue how to fix this.

这是完整的代码

import java.awt.Point;
import java.awt.geom.Point2D;
import java.io.File;
import java.io.IOException;
import org.apache.pdfbox.contentstream.PDFGraphicsStreamEngine;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.graphics.image.PDImage;
import org.apache.pdfbox.pdmodel.graphics.state.PDTextState;
import org.apache.pdfbox.util.Matrix;
import org.apache.pdfbox.util.Vector;

public class Test extends PDFGraphicsStreamEngine {

public static void main(String[] args) throws IOException {
    test();
}

public static void test() throws IOException {
    PDDocument document = PDDocument.load(new File("Test.pdf"));
    PDPage pageIn = document.getPage(0);
    PDDocument saveDoc = new PDDocument();
    PDPage savePage = new PDPage(pageIn.getMediaBox());
    saveDoc.addPage(savePage);
    try (PDPageContentStream out = new PDPageContentStream(saveDoc, savePage)) {
        Test test = new Test(pageIn, out);
        test.processPage(pageIn);
    }
}

private final PDPageContentStream out;

public Test(PDPage pageIn, PDPageContentStream out) {
    super(pageIn);
    this.out = out;
}

@Override
public void appendRectangle(Point2D p0, Point2D p1, Point2D p2, Point2D p3) throws IOException {
}

@Override
public void clip(int windingRule) throws IOException {
}

@Override
public void closePath() throws IOException {
}

@Override
public void curveTo(float x1, float y1, float x2, float y2, float x3, float y3) throws IOException {
}

@Override
public void drawImage(PDImage pdImage) throws IOException {
}

@Override
public void endPath() throws IOException {
}

@Override
public void fillAndStrokePath(int windingRule) throws IOException {
}

@Override
public void fillPath(int windingRule) throws IOException {
}

@Override
public Point2D getCurrentPoint() {
    return new Point(0, 0);
}

@Override
public void lineTo(float x, float y) throws IOException {
}

@Override
public void moveTo(float x, float y) throws IOException {
}

@Override
public void shadingFill(COSName shadingName) throws IOException {
}

@Override
protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode, Vector displacement) throws IOException {
    super.showGlyph(textRenderingMatrix, font, code, unicode, displacement);
    PDTextState textState = getGraphicsState().getTextState();
    out.beginText();
    out.setTextMatrix(getTextMatrix());
    out.setFont(textState.getFont(), textState.getFontSize());
    out.showText(unicode);
    out.endText();
}

@Override
public void strokePath() throws IOException {
}

}

有什么建议吗?

谢谢, 于尔根

推荐答案

tl; dr:该字体不支持编码.

tl;dr: That font doesn't support encoding.

问题的原因是您的Comic Sans子集字体确实具有"post"(后记)表,但是其glyphNames表为空. IE.您的字体没有字形名称.对于A-Z,a-z的名称类似于这些字符;因为("字形名称为"parenleft".由于缺少这些名称,PDFBox在PDTrueType.readEncodingFromFont()的第二部分中根据字形ID为"w"创建了伪名称,例如"90"(而不是"w") ).

The cause of the problem is that your Comic Sans subsetted font does have a "post" (postscript) table, but that its glyphNames table is null. I.e. your font does not have glyph names. For A-Z, a-z the names are like these characters; for "(" the glyph name is "parenleft". Because these names are missing, PDFBox creates pseudo names from the glyph ID like "90" (instead of "w") for "w" in the second part of PDTrueType.readEncodingFromFont().

但是,在编码时,PDFBox使用Adobe Glyphlist,因为字体没有编码条目.如果您使用PDFDebugger查找其他字体,例如R18,您会找到编码:WinAnsiEncoding":

However when encoding, PDFBox uses the Adobe Glyphlist, as the font does not have an encoding entry. If you look with PDFDebugger at the other fonts, e.g. R18, you'll find "Encoding: WinAnsiEncoding":

您显然正在做的是创建一个仅包含文本的新页面.执行此操作的另一种方法是分析内容流,并简单地删除所有标记内容而不是文本的标记.首先,请查看源代码下载中的RemoveAllText示例,然后下载PDF 32000规范,并查看操作员摘要"部分,并小心删除内容.例如,"Do"既可以用来绘制图像,也可以用来绘制XObject表单,它们也是内容流.

What you are apparently doing is to create a new page with text only. A different way to do this is to analyse the content streams and simply remove all tokens that paint stuff different than text. To start with that, have a look at the RemoveAllText example in the source code download, and download the PDF 32000 specification, and look at the part "operators summary" and be careful what you delete. For example "Do" is used both to draw images and to draw XObject forms, which are also content streams.

请参阅此处: 这两种解决方案都是错误的,第一种方法只是从脚下拉出所有图像,第二种方法是一个很好的开始,但并不注意检查参数是否为图像.

Both solutions are wrong, the first one just pulls all images from under the feet, the second one is a good start but does not take care to check whether the parameter is an image or not.

这篇关于pdfbox:...在此字体的编码中不可用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆