通过使用 PDFBox 在 PDF 中使用文本 postiton,将图像放置在文本上. [英] Placing an image over text, by using the text postiton in a PDF using PDFBox.

查看:64
本文介绍了通过使用 PDFBox 在 PDF 中使用文本 postiton,将图像放置在文本上.的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

结果是图像没有正确放置在文本上.我弄错了文本位置吗?

这是一个关于如何获取每个坐标的 x/y 坐标和大小的示例PDF 中的字符

public class MyClass extends PDFTextStripper {pdocument = PDDocument.load(new File(fileName));剥离器 = 新的 GetCharLocationAndSize();stripper.setSortByPosition(true);剥离器.setStartPage(0);stripper.setEndPage(pdocument.getNumberOfPages());Writer dummy = new OutputStreamWriter(newByteArrayOutputStream());stripper.writeText(pdocument, dummy);/** 覆盖 PDFTextStripper.writeString() 的默认功能*/@覆盖protected void WriteString(String string, List;textPositions) 抛出 IOException {String imagePath = "image.jpg";PDImageXObject pdImage =PDImageXObject.createFromFile(imagePath,pdocument);PDPageContentStream contentStream = 新PDPageContentStream(pdocument, stripper.getCurrentPage(), true,真的);对于(文本位置文本:文本位置){如果 (text.getUnicode().equals("a")) {contentStream.drawImage(pdImage, text.getXDirAdj(),text.getYDirAdj(), text.getWidthDirAdj(),text.getHeightDir());}}contentStream.close();pdocument.save("newdoc.pdf");}}

解决方案

检索合理坐标

您使用 text.getXDirAdj()text.getYDirAdj() 作为 xy 坐标内容流.这是行不通的,因为 PDFBox 在文本提取期间使用的坐标被转换为他们喜欢用于文本提取目的的坐标系,参见.JavaDocs:

/*** 这将得到文字方向调整后的字符 x 位置.* 这是根据文本方向调整的,以便第一个字符* 在那个方向是在左上角 0,0.** @return 文本的 x 坐标.*/公共浮动 getXDirAdj()/*** 这将获得文本的 y 位置,调整后使 0,0 位于左上角* 根据文字方向进行调整.** @return 调整后的字符 y 坐标.*/公共浮动 getYDirAdj()

对于 TextPosition text 你应该使用

text.getTextMatrix().getTranslatex()

text.getTextMatrix().getTranslateY()

但即使是这些数字也可能需要更正,参见.

Result is that image is not placed correctly over text. Am i getting the text positions wrong?

This is an example on how to get the x/y coordinates and size of each character in PDF

public class MyClass extends PDFTextStripper {

    pdocument = PDDocument.load(new File(fileName));

    stripper = new GetCharLocationAndSize();
    stripper.setSortByPosition(true);
    stripper.setStartPage(0);
    stripper.setEndPage(pdocument.getNumberOfPages());
    Writer dummy = new OutputStreamWriter(new 
    ByteArrayOutputStream());
    stripper.writeText(pdocument, dummy);


 /*
 * Override the default functionality of PDFTextStripper.writeString()
 */
@Override
protected void WriteString(String string, List<TextPosition> 
textPositions) throws IOException {

     String imagePath = "image.jpg";
     PDImageXObject pdImage = 
     PDImageXObject.createFromFile(imagePath,pdocument);

     PDPageContentStream contentStream = new 
     PDPageContentStream(pdocument, stripper.getCurrentPage(), true, 
     true);

     for (TextPosition text : textPositions) {

         if (text.getUnicode().equals("a")) {
         contentStream.drawImage(pdImage, text.getXDirAdj(), 
         text.getYDirAdj(), text.getWidthDirAdj(),text.getHeightDir()); 
       }
       }
    contentStream.close();
    pdocument.save("newdoc.pdf");
    }
    }

解决方案

Retrieving sensible coordinates

You use text.getXDirAdj() and text.getYDirAdj() as x and y coordinates in the content stream. This is won't work because the coordinates PDFBox uses during text extraction are transformed into a coordinate system they prefer for text extraction purposes, cf. the JavaDocs:

/**
 * This will get the text direction adjusted x position of the character.
 * This is adjusted based on text direction so that the first character
 * in that direction is in the upper left at 0,0.
 *
 * @return The x coordinate of the text.
 */
public float getXDirAdj()

/**
 * This will get the y position of the text, adjusted so that 0,0 is upper left and it is
 * adjusted based on the text direction.
 *
 * @return The adjusted y coordinate of the character.
 */
public float getYDirAdj()

For a TextPosition text you should instead use

text.getTextMatrix().getTranslatex()

and

text.getTextMatrix().getTranslateY()

But even these numbers may have to be corrected, cf. this answer, because PDFBox has multiplied the matrix by a translation making the lower left corner of the crop box the origin.

Thus, if PDRectangle cropBox is the crop box of the current page, use

text.getTextMatrix().getTranslatex() + cropBox.getLowerLeftX()

and

text.getTextMatrix().getTranslateY() + cropBox.getLowerLeftY()

(This coordinate normalization of PDFBox is a PITA for anyone who actually wants to work with the text coordinates...)

Other issues

Your code has some other issues, one of them becoming clear with the document you shared: You append to the page content stream without resetting the graphics context:

PDPageContentStream contentStream = new PDPageContentStream(pdocument,
        stripper.getCurrentPage(), true, true);

The constructor with this signature assumes you don't want to reset the context. Use the one with an additional boolean parameter and set that to true to request context resets:

PDPageContentStream contentStream = new PDPageContentStream(pdocument,
        stripper.getCurrentPage(), true, true, true);

Now the context is reset and the position is ok again.

Both these constructors are deprecated, though, and shouldn't be used for that reason. In the development branch they have been removed already. Instead use

PDPageContentStream contentStream = new PDPageContentStream(pdocument,
        stripper.getCurrentPage(), AppendMode.APPEND, true, true);

This introduces another issue, though: You create a new PDPageContentStream for each writeString call. If that is done with context reset each time, the nesting of saveGraphicsState/restoreGraphicsState pairs may become pretty deep. Thus, you should only create one such content stream per page and use it in all writeString calls for that page.

Thus, your text stripper sub-class might look like this:

class CoverCharByImage extends PDFTextStripper {
    public CoverCharByImage(PDImageXObject pdImage) throws IOException {
        super();
        this.pdImage = pdImage;
    }

    final PDImageXObject pdImage;
    PDPageContentStream contentStream = null;

    @Override
    public void processPage(PDPage page) throws IOException {
        super.processPage(page);
        if (contentStream != null) {
            contentStream.close();
            contentStream = null;
        }
    }

    @Override
    protected void writeString(String string, List<TextPosition> textPositions) throws IOException {
        if (contentStream == null)
            contentStream = new PDPageContentStream(document, getCurrentPage(), AppendMode.APPEND, true, true);

        PDRectangle cropBox = getCurrentPage().getCropBox();

        for (TextPosition text : textPositions) {
            if (text.getUnicode().equals("a")) {
                contentStream.drawImage(pdImage, text.getTextMatrix().getTranslateX() + cropBox.getLowerLeftX(),
                        text.getTextMatrix().getTranslateY() + cropBox.getLowerLeftY(),
                        text.getWidthDirAdj(), text.getHeightDir());
            }
        }
    }
}

(CoverCharacterByImage inner class)

and it may be used like this:

PDDocument pdocument = PDDocument.load(...);

String imagePath = ...;
PDImageXObject pdImage = PDImageXObject.createFromFile(imagePath, pdocument);

CoverCharByImage stripper = new CoverCharByImage(pdImage);
stripper.setSortByPosition(true);
Writer dummy = new OutputStreamWriter(new ByteArrayOutputStream());
stripper.writeText(pdocument, dummy);
pdocument.save(...);

(CoverCharacterByImage test testCoverLikeLez)

resulting in

etc.

这篇关于通过使用 PDFBox 在 PDF 中使用文本 postiton,将图像放置在文本上.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆