如何使用iText提取PDF中矩形的颜色 [英] How to extract the color of a rectangle in a PDF, with iText

查看:922
本文介绍了如何使用iText提取PDF中矩形的颜色的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用iText提取PDF中矩形的颜色。
以下是PDF页面的全部内容:





这是用iText提取的页面内容:

  q 
BT
36 806 Td
0 -18 Td
/ F1 12 Tf
(选项1:)Tj
0 0 Td
0 -94.31 Td
ET
Q
q
Q
q
2 J
0 G
0.5 w
88.3 693.69 139.47 94.31 re
S
0.5 w
227.77 693.69 139.47 94.31 re
S
0.5 w
367.23 693.69 139.47 94.31 re
S
Q
BT
1 0 0 1 90.3 774 Tm
/ F1 12 Tf
(矩形:)Tj
ET
q 1.13 0 0 1.13 229.77 695.69 cm / Xf1 Do Q
BT
1 0 0 1 369.23 774 Tm
/ F1 12 Tf
(矩形缩放)Tj
1 0 0 1 369.23 762 Tm
(to适合在单元格内,你)Tj
1 0 0 1 369.23 750 Tm
(见填充。)Tj
ET
228 810 m
338 8 10 l
S

但是,我无法从中提取代码,我说的是红色,如果我生成相同的PDF但是使用另一种颜色而不是红色,则页面内容没有任何变化(上面显示的代码)。



所以,我的问题是,如何使用iText库for Java中的某些方法或属性提取该颜色。



我' m使用 iText 5.5.9 ,这是我用来生成PDF样本的代码示例:



感谢您的任何帮助可以提供!






这是我用来生成PDF的代码:

  String dest =C:\\TestCreation.pdf; 
凭证凭证=新凭证();
PdfWriter writer = PdfWriter.getInstance(document,new FileOutputStream(dest));
document.open();

document.add(新段落(选项1:));
PdfPTable table = new PdfPTable(3);
table.addCell(一个矩形:);
PdfTemplate template = writer.getDirectContent()。createTemplate(120,80);
template.setColorFill(BaseColor.RED);
template.rectangle(0,0,120,80);
template.fill();
writer.releaseTemplate(template);
table.addCell(Image.getInstance(template));
table.addCell(矩形被缩放以适合单元格,你会看到填充。);
document.add(table);

PdfContentByte cb = writer.getDirectContent();
cb.moveTo(228,810);
cb.lineTo(338,810);
cb.stroke();
document.close();

你可以在这里看到PDF文件:

解决方案

您的代码显示了它,这是您创建矩形并添加它的方式:

  PdfTemplate template = writer.getDirectContent()。createTemplate(120,80); 
template.setColorFill(BaseColor.RED);
template.rectangle(0,0,120,80);
template.fill();
writer.releaseTemplate(template);
table.addCell(Image.getInstance(template));

iText PdfTemplate 生成PDF表单XObject 。表单XObject依次是 PDF内容流,它是任何图形对象序列(包括路径对象,文本对象和采样图像)的独立描述(ISO 32000的8.10.1节) -1),即一个单独的绘图指令流,其内容可以从任何其他内容流中引用。



对于页面内容流,这是包含XObject形式的行:

  q 1.13 0 0 1.13 229.77 695.69 cm / Xf1 Do Q 

(转换矩阵被操纵以拉伸1.13并移动一点,然后XObject Xf1 绘制,然后重置转换矩阵。)



该XObject Xf1 的内容流是这样的:

  1 0 0 rg 
0 0 120 80 re
f

即它将非描边颜色设置为RGB红色,在原点定义一个120x80的矩形,并填充它。







这是我用来获取页面内容的行代码:

  String pageContent = new String (reader.getPageContent(1)); 


该行不足以获取所有内容详情:


  1. 它只返回立即页面内容,但不返回XObjects和使用的模式中的详细说明在直接的内容。经常可以找到其直接页面内容仅引用一个或多个XObject形式的PDF。


  2. 尽管出现了页面内容具有二进制性质,但不是文本。只要使用带有非标准编码的字体,PDF字符串内容在您的Java字符串中就会毫无意义,或者(取决于您的标准编码)甚至会破坏。


相反,应该使用iText解析器框架,例如像这样:

  ExtRenderListener extRenderListener = new ExtRenderListener()
{
@Override
public void beginTextBlock(){}
@Override
public void renderText(TextRenderInfo renderInfo){}
@Override
public void endTextBlock(){}
@Override
public void renderImage(ImageRenderInfo renderInfo){}

@Override
public void modifyPath(PathConstructionRenderInfo renderInfo)
{
pathInfos.add(renderInfo);
}

@Override
public Path renderPath(PathPaintingRenderInfo renderInfo)
{
GraphicsState graphicsState;
try
{
graphicsState = getGraphicsState(renderInfo);
}
catch(NoSuchFieldException | SecurityException | IllegalArgumentException | IllegalAccessException e)
{
e.printStackTrace();
返回null;
}

Matrix ctm = graphicsState.getCtm();

if((renderInfo.getOperation()& PathPaintingRenderInfo.FILL)!= 0)
{
System.out.printf(FILL(%s),toString (graphicsState.getFillColor()));
if((renderInfo.getOperation()& PathPaintingRenderInfo.STROKE)!= 0)
System.out.print(and);
}
if((renderInfo.getOperation()& PathPaintingRenderInfo.STROKE)!= 0)
{
System.out.printf(STROKE(%s),的toString(graphicsState.getStrokeColor()));
}

System.out.print(路径);

for(PathConstructionRenderInfo pathConstructionRenderInfo:pathInfos)
{
switch(pathConstructionRenderInfo.getOperation())
{
case PathConstructionRenderInfo.MOVETO:
System.out.printf(移动到%s,转换(ctm,pathConstructionRenderInfo.getSegmentData()));
休息;
case PathConstructionRenderInfo.CLOSE:
System.out.printf(close%s,transform(ctm,pathConstructionRenderInfo.getSegmentData()));
休息;
case PathConstructionRenderInfo.CURVE_123:
System.out.printf(curve123%s,transform(ctm,pathConstructionRenderInfo.getSegmentData()));
休息;
case PathConstructionRenderInfo.CURVE_13:
System.out.printf(curve13%s,transform(ctm,pathConstructionRenderInfo.getSegmentData()));
休息;
case PathConstructionRenderInfo.CURVE_23:
System.out.printf(curve23%s,transform(ctm,pathConstructionRenderInfo.getSegmentData()));
休息;
case PathConstructionRenderInfo.LINETO:
System.out.printf(line to%s,transform(ctm,pathConstructionRenderInfo.getSegmentData()));
休息;
case PathConstructionRenderInfo.RECT:
System.out.printf(rectangle%s,transform(ctm,expandRectangleCoordinates(pathConstructionRenderInfo.getSegmentData())));
休息;
}
}
System.out.println();

pathInfos.clear();
返回null;
}

@Override
public void clipPath(int rule)
{
}

List< Float> transform(Matrix ctm,List< Float> coordinates)
{
List< Float> result = new ArrayList<>();
for(int i = 0; i + 1< coordinates.size(); i + = 2)
{
Vector vector = new Vector(coordinates.get(i),coordinates) .get(i + 1),1);
vector = vector.cross(ctm);
result.add(vector.get(Vector.I1));
result.add(vector.get(Vector.I2));
}
返回结果;
}

列表< Float> expandRectangleCoordinates(List< Float> rectangle)
{
if(rectangle.size()< 4)
return Collections.emptyList();
返回Arrays.asList(
rectangle.get(0),rectangle.get(1),
rectangle.get(0)+ rectangle.get(2),rectangle.get(1 ),
rectangle.get(0)+ rectangle.get(2),rectangle.get(1)+ rectangle.get(3),
rectangle.get(0),rectangle.get(1) )+ rectangle.get(3)
);
}

字符串toString(BaseColor baseColor)
{
if(baseColor == null)
返回DEFAULT;
返回String.format(%s,%s,%s,baseColor.getRed(),baseColor.getGreen(),baseColor.getBlue());
}

GraphicsState getGraphicsState(PathPaintingRenderInfo renderInfo)抛出NoSuchFieldException,SecurityException,IllegalArgumentException,IllegalAccessException
{
Field gsField = PathPaintingRenderInfo.class.getDeclaredField(gs);
gsField.setAccessible(true);
return(GraphicsState)gsField.get(renderInfo);
}

final List< PathConstructionRenderInfo> pathInfos = new ArrayList<>();
};

try(InputStream resource = [RETREIEVE FILE TO PARSE AS INPUT STREAM])
{
PdfReader pdfReader = new PdfReader(resource);

for(int page = 1; page< = pdfReader.getNumberOfPages(); page ++)
{
System.out.printf(\ nPage%s \ n ==== \ n,页面);

PdfReaderContentParser parser = new PdfReaderContentParser(pdfReader);
parser.processContent(page,extRenderListener);

}
}

ExtractPaths 测试方法 testExtractFromTestCreation



对于您的示例文件,这会导致输出

  Page 1 
====
STROKE(0,0,0)路径矩形[88.3 ,693.69,227.77,693.69,227.77,788.0,88.3,788.0]
STROKE(0,0,0)路径矩形[227.77,693.69,367.24,693.69,367.24,788.0,227.77,788.0]
STROKE(0,0,0)路径矩形[367.23,693.69,506.7,693.69,506.7,788.0,367.23,788.0]
FILL(255,0,0)路径矩形[229.77,695.69,365.37 ,695.69,365.37,786.09,229.77,786.09]
STROKE(DEFAULT)路径移至[228.0,810.0]行至[338.0,810.0]

iText表示颜色值为字节(0-255)而不是单位运行PDF使用的ge(0.0 - 1.0)。因此,您会看到'(255,0,0)',其中PDF选择'1 0 0 rg'。


I'm trying to extract the color of a rectangle in a PDF with iText. The following is all what the PDF page have:

And this is the page content extracted with iText:

q
BT
36 806 Td
0 -18 Td
/F1 12 Tf
(Option 1:)Tj
0 0 Td
0 -94.31 Td
ET
Q
q
Q
q
2 J
0 G
0.5 w
88.3 693.69 139.47 94.31 re
S
0.5 w
227.77 693.69 139.47 94.31 re
S
0.5 w
367.23 693.69 139.47 94.31 re
S
Q
BT
1 0 0 1 90.3 774 Tm
/F1 12 Tf
(A rectangle:)Tj
ET
q 1.13 0 0 1.13 229.77 695.69 cm /Xf1 Do Q
BT
1 0 0 1 369.23 774 Tm
/F1 12 Tf
(The rectangle is scaled)Tj
1 0 0 1 369.23 762 Tm
(to fit inside the cell, you)Tj
1 0 0 1 369.23 750 Tm
(see a padding.)Tj
ET
228 810 m
338 810 l
S

But, there is something I'm not able to extract from that code, I'm talking about the red color, and if I generate the same PDF but with another color instead of red, nothing change in the page content (code showed above).

So, my question is, how can I extract that color using some method or properties from iText library for Java.

I'm using iText 5.5.9, and this is the code example I'm using to generate the PDF sample:

Thanks for any help you can provide!


This is the code I'm using to generate the PDF:

String dest = "C:\\TestCreation.pdf";
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(dest));
document.open();

document.add(new Paragraph("Option 1:"));
PdfPTable table = new PdfPTable(3);
table.addCell("A rectangle:");
PdfTemplate template = writer.getDirectContent().createTemplate(120, 80);
template.setColorFill(BaseColor.RED);
template.rectangle(0, 0, 120, 80);
template.fill();
writer.releaseTemplate(template);
table.addCell(Image.getInstance(template));
table.addCell("The rectangle is scaled to fit inside the cell, you see a padding.");
document.add(table);

PdfContentByte cb = writer.getDirectContent();
cb.moveTo(228, 810);
cb.lineTo(338, 810);
cb.stroke();
document.close();

And you can see here, the PDF file: PDF example

This is the line code I'm using to get the page content: String pageContent = new String(reader.getPageContent(1));

I've been reviewing all the reader object, and I was able to locate the rectangle, but not its color:

解决方案

Your code shows it, this is how you create the rectangle and add it:

PdfTemplate template = writer.getDirectContent().createTemplate(120, 80);
template.setColorFill(BaseColor.RED);
template.rectangle(0, 0, 120, 80);
template.fill();
writer.releaseTemplate(template);
table.addCell(Image.getInstance(template));

An iText PdfTemplate generates a PDF form XObject. A form XObject in turn is a PDF content stream that is a self-contained description of any sequence of graphics objects (including path objects, text objects, and sampled images) (section 8.10.1 of ISO 32000-1), i.e. a separate stream of drawing instructions the content of which can be referenced from any other content stream.

In the case of your page content stream, this is the line where the form XObject is included:

q 1.13 0 0 1.13 229.77 695.69 cm /Xf1 Do Q

(The transformation matrix is manipulated to stretch by 1.13 and moved a bit, then the XObject Xf1 is drawn, then the transformation matrix is reset.)

The content stream of that XObject Xf1 is this:

1 0 0 rg
0 0 120 80 re
f

I.e. it sets the non-stroking color to RGB red, defines a 120x80 rectangle at the origin, and fills it.


This is the line code I'm using to get the page content:

String pageContent = new String(reader.getPageContent(1));

That line is not adequate for getting all the content details:

  1. It only returns the immediate page content but not the detailed instructions from the form XObjects and patterns used in the immediate content. Quite often one finds PDFs whose immediate page contents only reference one or more form XObjects.

  2. In spite of appearances the page content is of a binary nature, not a textual. As soon as fonts with non-standard encodings are used, PDF string contents are meaningless in your Java String or (depending on your standard encoding) even broken.

Instead one should use the iText parser framework, e.g. like this:

ExtRenderListener extRenderListener = new ExtRenderListener()
{
    @Override
    public void beginTextBlock()                        {   }
    @Override
    public void renderText(TextRenderInfo renderInfo)   {   }
    @Override
    public void endTextBlock()                          {   }
    @Override
    public void renderImage(ImageRenderInfo renderInfo) {   }

    @Override
    public void modifyPath(PathConstructionRenderInfo renderInfo)
    {
        pathInfos.add(renderInfo);
    }

    @Override
    public Path renderPath(PathPaintingRenderInfo renderInfo)
    {
        GraphicsState graphicsState;
        try
        {
            graphicsState = getGraphicsState(renderInfo);
        }
        catch (NoSuchFieldException | SecurityException | IllegalArgumentException | IllegalAccessException e)
        {
            e.printStackTrace();
            return null;
        }

        Matrix ctm = graphicsState.getCtm();

        if ((renderInfo.getOperation() & PathPaintingRenderInfo.FILL) != 0)
        {
            System.out.printf("FILL (%s) ", toString(graphicsState.getFillColor()));
            if ((renderInfo.getOperation() & PathPaintingRenderInfo.STROKE) != 0)
                System.out.print("and ");
        }
        if ((renderInfo.getOperation() & PathPaintingRenderInfo.STROKE) != 0)
        {
            System.out.printf("STROKE (%s) ", toString(graphicsState.getStrokeColor()));
        }

        System.out.print("the path ");

        for (PathConstructionRenderInfo pathConstructionRenderInfo : pathInfos)
        {
            switch (pathConstructionRenderInfo.getOperation())
            {
            case PathConstructionRenderInfo.MOVETO:
                System.out.printf("move to %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
                break;
            case PathConstructionRenderInfo.CLOSE:
                System.out.printf("close %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
                break;
            case PathConstructionRenderInfo.CURVE_123:
                System.out.printf("curve123 %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
                break;
            case PathConstructionRenderInfo.CURVE_13:
                System.out.printf("curve13 %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
                break;
            case PathConstructionRenderInfo.CURVE_23:
                System.out.printf("curve23 %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
                break;
            case PathConstructionRenderInfo.LINETO:
                System.out.printf("line to %s ", transform(ctm, pathConstructionRenderInfo.getSegmentData()));
                break;
            case PathConstructionRenderInfo.RECT:
                System.out.printf("rectangle %s ", transform(ctm, expandRectangleCoordinates(pathConstructionRenderInfo.getSegmentData())));
                break;
            }
        }
        System.out.println();

        pathInfos.clear();
        return null;
    }

    @Override
    public void clipPath(int rule)
    {
    }

    List<Float> transform(Matrix ctm, List<Float> coordinates)
    {
        List<Float> result = new ArrayList<>();
        for (int i = 0; i + 1 < coordinates.size(); i += 2)
        {
            Vector vector = new Vector(coordinates.get(i), coordinates.get(i + 1), 1);
            vector = vector.cross(ctm);
            result.add(vector.get(Vector.I1));
            result.add(vector.get(Vector.I2));
        }
        return result;
    }

    List<Float> expandRectangleCoordinates(List<Float> rectangle)
    {
        if (rectangle.size() < 4)
            return Collections.emptyList();
        return Arrays.asList(
                rectangle.get(0), rectangle.get(1),
                rectangle.get(0) + rectangle.get(2), rectangle.get(1),
                rectangle.get(0) + rectangle.get(2), rectangle.get(1) + rectangle.get(3),
                rectangle.get(0), rectangle.get(1) + rectangle.get(3)
                );
    }

    String toString(BaseColor baseColor)
    {
        if (baseColor == null)
            return "DEFAULT";
        return String.format("%s,%s,%s", baseColor.getRed(), baseColor.getGreen(), baseColor.getBlue());
    }

    GraphicsState getGraphicsState(PathPaintingRenderInfo renderInfo) throws NoSuchFieldException, SecurityException, IllegalArgumentException, IllegalAccessException
    {
        Field gsField = PathPaintingRenderInfo.class.getDeclaredField("gs");
        gsField.setAccessible(true);
        return (GraphicsState) gsField.get(renderInfo);
    }

    final List<PathConstructionRenderInfo> pathInfos = new ArrayList<>();
};

try (   InputStream resource = [RETRIEVE FILE TO PARSE AS INPUT STREAM])
{
    PdfReader pdfReader = new PdfReader(resource);

    for (int page = 1; page <= pdfReader.getNumberOfPages(); page++)
    {
        System.out.printf("\nPage %s\n====\n", page);

        PdfReaderContentParser parser = new PdfReaderContentParser(pdfReader);
        parser.processContent(page, extRenderListener);

    }
}

(ExtractPaths test method testExtractFromTestCreation)

For your sample file this results in the output

Page 1
====
STROKE (0,0,0) the path rectangle [88.3, 693.69, 227.77, 693.69, 227.77, 788.0, 88.3, 788.0] 
STROKE (0,0,0) the path rectangle [227.77, 693.69, 367.24, 693.69, 367.24, 788.0, 227.77, 788.0] 
STROKE (0,0,0) the path rectangle [367.23, 693.69, 506.7, 693.69, 506.7, 788.0, 367.23, 788.0] 
FILL (255,0,0) the path rectangle [229.77, 695.69, 365.37, 695.69, 365.37, 786.09, 229.77, 786.09] 
STROKE (DEFAULT) the path move to [228.0, 810.0] line to [338.0, 810.0] 

iText represents color values as bytes (0-255) instead of as the unit range (0.0 - 1.0) the PDF uses. Thus, you see '(255,0,0)' where the PDF selected '1 0 0 rg'.

这篇关于如何使用iText提取PDF中矩形的颜色的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆