如何使用pdfbox获得字体颜色 [英] How to get font color using pdfbox

查看:225
本文介绍了如何使用pdfbox获得字体颜色的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用pdfbox从pdf中提取包含所有信息的文本.我得到了我想要的所有信息,除了颜色.我尝试了多种获取字体颜色的方法(包括使用PDFBox获取文本颜色).但是没有用.现在,我从pdfBox的PageDrawer类复制了代码.但是,那么RGB值也不正确.

I am trying to extract text with all information from the pdf using pdfbox. I got all the information i want, except color. I tried different ways to get the fontcolor (including Getting Text Colour with PDFBox). But not working. And now I copied code from PageDrawer class of pdfBox. But then also the RGB value is not correct.

protected void processTextPosition(TextPosition text) {

        Composite com;
        Color col;
        switch(this.getGraphicsState().getTextState().getRenderingMode()) {
        case PDTextState.RENDERING_MODE_FILL_TEXT:
            com = this.getGraphicsState().getNonStrokeJavaComposite();
            int r =       this.getGraphicsState().getNonStrokingColor().getJavaColor().getRed();
            int g = this.getGraphicsState().getNonStrokingColor().getJavaColor().getGreen();
            int b = this.getGraphicsState().getNonStrokingColor().getJavaColor().getBlue();
            int rgb = this.getGraphicsState().getNonStrokingColor().getJavaColor().getRGB();
            float []cosp = this.getGraphicsState().getNonStrokingColor().getColorSpaceValue();
            PDColorSpace pd = this.getGraphicsState().getNonStrokingColor().getColorSpace();
            break;
        case PDTextState.RENDERING_MODE_STROKE_TEXT:
            System.out.println(this.getGraphicsState().getStrokeJavaComposite().toString());
            System.out.println(this.getGraphicsState().getStrokingColor().getJavaColor().getRGB());
           break;
        case PDTextState.RENDERING_MODE_NEITHER_FILL_NOR_STROKE_TEXT:
            //basic support for text rendering mode "invisible"
            Color nsc = this.getGraphicsState().getStrokingColor().getJavaColor();
            float[] components = {Color.black.getRed(),Color.black.getGreen(),Color.black.getBlue()};
            Color  c1 = new Color(nsc.getColorSpace(),components,0f);
            System.out.println(this.getGraphicsState().getStrokeJavaComposite().toString());
            break;
        default:
            System.out.println(this.getGraphicsState().getNonStrokeJavaComposite().toString());
            System.out.println(this.getGraphicsState().getNonStrokingColor().getJavaColor().getRGB());
    }

我正在使用上面的代码.得到的值是r = 0,g = 0,b = 0,内部cosp对象值是[0.0],内部pd对象数组=空,并且colorSpace =空.并且RGB值始终为-16777216.请帮我.预先感谢.

I am using the above code. The values getting are r = 0, g = 0, b = 0, inside cosp object value is [0.0], inside pd object array = null and colorSpace = null. and RGB value is always -16777216. Please help me. Thanks in advance.

推荐答案

我尝试了您发布的链接中的代码,它对我有用.我得到的颜色是148.92、179.0010.1和214.965.我希望我可以给我我的PDF,如果我将其存储在SO的外部呢?我的PDF使用一种淡蓝色,看起来很匹配.这只是在Word 2010中创建并导出的文本的一页,没有什么太紧张了.

I tried the code in the link you posted and it worked for me. The colors I get back are 148.92, 179.01001 and 214.965. I wish I could give you my PDF to work with, maybe if I store it externally to SO? My PDF used a sort of palish blue color and that seems to match. It was just one page of text created in Word 2010 and exported, nothing too intense.

一些建议....

  1. 回想一下,返回的值是介于0和1之间的一个浮点数.如果一个值被意外地转换为int,则这些值当然最终将包含几乎所有的0.链接到代码的255的倍数以获得一个范围.0至255.
  2. 正如评论者所说,PDF文件最常见的颜色是黑色,即0 0 0

这就是我现在能想到的,否则,我将拥有pdfbox和fontbox的1.7.1版本,就像我说的那样,我非常关注您提供的链接.

That is all I can think of now, otherwise I have version of 1.7.1 of pdfbox and fontbox and like I said I pretty much followed the link you gave.

编辑

根据我的评论,这也许是对诸如 color.pdf 之类的pdf文件执行的一种微创方法?

Based upon my comments, here perhaps is a minorly invasive way of doing it for pdf files like color.pdf?

PDFStreamEngine.java 中的 processOperator 方法中,可以在try块内执行

In PDFStreamEngine.java in the processOperator method one can do inside the try block

if (operation.equals("RG")) {
   // stroking color space
   System.out.println(operation);
   System.out.println(arguments);
} else if (operation.equals("rg")) {
   // non-stroking color space
   System.out.println(operation);
   System.out.println(arguments);
} else if (operation.equals("BT")) {
   System.out.println(operation);    
} else if (operation.equals("ET")) {
   System.out.println(operation);           
}

这将向您显示信息,然后由您根据需要处理每个部分的颜色信息.这是在 color.pdf ...

This will show you the information, then it is up to you to process the color information for each section according to your needs. Here is a snippet from the beginning of the output of the above code when run on color.pdf ...

BTG[COSInt(1),COSInt(0),CosInt(0)]RG[COSInt(1),COSInt(0),CosInt(0)]ET英国电信ET英国电信G[COSFloat {0.573},COSFloat {0.816},COSFloat {0.314}]RG[COSFloat {0.573},COSFloat {0.816},COSFloat {0.314}]ET......

在上面的输出中,您会看到一个空的BT ET部分,该部分标记为DEVICEGRAY.所有其他元素为您提供R,G和B分量的[0,1]值

You see in the above output an empty BT ET section, this being a section which is marked DEVICEGRAY. All the other give you [0,1] values for the R, G and B components

这篇关于如何使用pdfbox获得字体颜色的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆