如何使用pdfbox获取页面内容的高度 [英] How to get page content height using pdfbox

查看:1077
本文介绍了如何使用pdfbox获取页面内容的高度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以使用pdfbox获取页面内容的高度? 我想我尝试了一切,但每个(PDRectangle)返回页面的完整高度:842. 首先,我认为这是因为页码位于页面底部,但是当我在Illustrator中打开pdf时,整个内容位于复合元素内部,而没有扩展到整个页面高度.因此,如果插画家可以将其视为单独的元素并计算其高度,我想这也应该在pdfbox中实现.

Is this possible to get the height of the page content using pdfbox? I think I tried everything but each (PDRectangle) returns full height of the page: 842. First I thought that this is because the page number place at the bottom of the page, but when I opened pdf in Illustrator, the whole content is inside compound element, and isn't extended to the whole page height. So if illustrator can see it as separate element and calculate its height, I guess this should also be possible in pdfbox.

示例页面:

推荐答案

一般而言

PDF规范允许PDF提供许多页面边界,请参见此答案.除了它们,内容边界只能从页面内容中得出,例如来自

In general

The PDF specification allows a PDF to provide a number of page boundaries, cf this answer. Aside from them content boundaries may only be derived from page contents, e.g. from

  • 表格XObjects:

  • Form XObjects:

form XObject 是PDF内容流,它是对任何图形对象序列(包括路径对象,文本对象和采样图像)的独立描述.可以将XObject表单绘制多次(在多个页面上或在同一页面上的多个位置),并且每次产生相同的结果,但仅在调用它时处于图形状态.

A form XObject is a PDF content stream that is a self-contained description of any sequence of graphics objects (including path objects, text objects, and sampled images). A form XObject may be painted multiple times—either on several pages or at several locations on the same page—and produces the same results each time, subject only to the graphics state at the time it is invoked.

  • 剪切路径:

  • Clipping Paths:

    图形状态应包含一个当前剪切路径,该路径限制了受绘画操作员影响的页面区域.该路径的封闭子路径应定义可以绘制的区域.落在该区域内的标记应粘贴到页面上;那些不在它外面的人不会.

    The graphics state shall contain a current clipping path that limits the regions of the page affected by painting operators. The closed subpaths of this path shall define the area that can be painted. Marks falling inside this area shall be applied to the page; those falling outside it shall not be.

  • ...

  • ...

    要找到它们之一,必须解析页面内容,寻找适当的操作,然后计算结果边界.

    To find either of them, one has to parse the page content, look for the appropriate operations, and calculate the resulting boundaries.

    每个样本PDF都仅明确定义一个页面边界,即 MediaBox .因此,所有其他PDF页面边界( CropBox BleedBox TrimBox ArtBox )均默认为它.因此,难怪您尝试

    Each of your sample PDFs defines explicitly only one page boundary, the MediaBox. Thus, all of the other PDF page boundaries (CropBox, BleedBox, TrimBox, ArtBox) default to it. So it is no wonder that in your attempts

    每个(PDRectangle)返回页面的完整高度:842

    each (PDRectangle) returns full height of the page: 842

    它们都不包含XObjects形式,但是都使用剪切路径.

    Neither of them contains form XObjects, but both make use of clipping paths.

    • 如果是test-pdf4.pdf:

    • In case of test-pdf4.pdf:

    Start at: 28.31999969482422, 813.6799926757812
    Line to: 565.9199829101562, 813.6799926757812
    Line to: 565.9199829101562, 660.2196655273438
    Line to: 28.31999969482422, 660.2196655273438
    Line to: 28.31999969482422, 813.6799926757812
    

    (这可能与您的问题中的草图匹配.)

    (This might match the sketch in your question.)

    如果是test-pdf5.pdf:

    In case of test-pdf5.pdf:

    Start at: 23.0, 34.0
    Line to: 572.0, 34.0
    Line to: 572.0, -751.0
    Line to: 23.0, -751.0
    Line to: 23.0, 34.0
    

    Start at: 23.0, 819.0
    Line to: 572.0, 819.0
    Line to: 572.0, 34.0
    Line to: 23.0, 34.0
    Line to: 23.0, 819.0
    

  • 由于与草图的匹配,我认为Illustrator会考虑有效剪裁路径有效时绘制的所有内容,即以剪切路径为边框的 compound元素.

    Due to the match with the sketch I would assume that Illustrator considers everything drawn while a non-trivial clipping path is in effect, a compound element with the clipping path as border.

    我使用PDFBox查找上面提到的剪切路径.我使用了目前正在开发的2.0.0版本的SNAPSHOT,因为与当前版本1.8.8相比,所需的API有了很大的改进.

    I used PDFBox to find the clipping paths mentioned above. I used the current SNAPSHOT of the version 2.0.0 now under development as the required APIs have been much improved compared to the current release version 1.8.8.

    我将PDFGraphicsStreamEngine扩展为ClipPathFinder类:

    public class ClipPathFinder extends PDFGraphicsStreamEngine implements Iterable<Path>
    {
        public ClipPathFinder(PDPage page)
        {
            super(page);
        }
    
        //
        // PDFGraphicsStreamEngine overrides
        //
        public void findClipPaths() throws IOException
        {
            processPage(getPage());
        }
    
        @Override
        public void appendRectangle(Point2D p0, Point2D p1, Point2D p2, Point2D p3) throws IOException
        {
            startPathIfNecessary();
            currentPath.appendRectangle(toFloat(p0), toFloat(p1), toFloat(p2), toFloat(p3));
        }
    
        @Override
        public void drawImage(PDImage pdImage) throws IOException { }
    
        @Override
        public void clip(int windingRule) throws IOException
        {
            currentPath.complete(windingRule);
            paths.add(currentPath);
            currentPath = null;
        }
    
        @Override
        public void moveTo(float x, float y) throws IOException
        {
            startPathIfNecessary();
            currentPath.moveTo(x, y);
        }
    
        @Override
        public void lineTo(float x, float y) throws IOException
        {
            currentPath.lineTo(x, y);
        }
    
        @Override
        public void curveTo(float x1, float y1, float x2, float y2, float x3, float y3) throws IOException
        {
            currentPath.curveTo(x1, y1, x2, y2, x3, y3);
        }
    
        @Override
        public Point2D.Float getCurrentPoint() throws IOException
        {
            return currentPath.getCurrentPoint();
        }
    
        @Override
        public void closePath() throws IOException
        {
            currentPath.closePath();
        }
    
        @Override
        public void endPath() throws IOException
        {
            currentPath = null;
        }
    
        @Override
        public void strokePath() throws IOException
        {
            currentPath = null;
        }
    
        @Override
        public void fillPath(int windingRule) throws IOException
        {
            currentPath = null;
        }
    
        @Override
        public void fillAndStrokePath(int windingRule) throws IOException
        {
            currentPath = null;
        }
    
        @Override
        public void shadingFill(COSName shadingName) throws IOException
        {
            currentPath = null;
        }
    
        void startPathIfNecessary()
        {
            if (currentPath == null)
                currentPath = new Path();
        }
    
        Point2D.Float toFloat(Point2D p)
        {
            if (p == null || (p instanceof Point2D.Float))
            {
                return (Point2D.Float)p;
            }
            return new Point2D.Float((float)p.getX(), (float)p.getY());
        }
    
        //
        // Iterable<Path> implementation
        //
        public Iterator<Path> iterator()
        {
            return paths.iterator();
        }
    
        Path currentPath = null;
        final List<Path> paths = new ArrayList<Path>();
    }
    

    它使用此帮助器类来表示路径:

    It uses this helper class to represent paths:

    public class Path implements Iterable<Path.SubPath>
    {
        public static class Segment
        {
            Segment(Point2D.Float start, Point2D.Float end)
            {
                this.start = start;
                this.end = end;
            }
    
            public Point2D.Float getStart()
            {
                return start;
            }
    
            public Point2D.Float getEnd()
            {
                return end;
            }
    
            final Point2D.Float start, end; 
        }
    
        public class SubPath implements Iterable<Segment>
        {
            public class Line extends Segment
            {
                Line(Point2D.Float start, Point2D.Float end)
                {
                    super(start, end);
                }
    
                //
                // Object override
                //
                @Override
                public String toString()
                {
                    StringBuilder builder = new StringBuilder();
                    builder.append("    Line to: ")
                           .append(end.getX())
                           .append(", ")
                           .append(end.getY())
                           .append('\n');
                    return builder.toString();
                }
            }
    
            public class Curve extends Segment
            {
                Curve(Point2D.Float start, Point2D.Float control1, Point2D.Float control2, Point2D.Float end)
                {
                    super(start, end);
                    this.control1 = control1;
                    this.control2 = control2;
                }
    
                public Point2D getControl1()
                {
                    return control1;
                }
    
                public Point2D getControl2()
                {
                    return control2;
                }
    
                //
                // Object override
                //
                @Override
                public String toString()
                {
                    StringBuilder builder = new StringBuilder();
                    builder.append("    Curve to: ")
                           .append(end.getX())
                           .append(", ")
                           .append(end.getY())
                           .append(" with Control1: ")
                           .append(control1.getX())
                           .append(", ")
                           .append(control1.getY())
                           .append(" and Control2: ")
                           .append(control2.getX())
                           .append(", ")
                           .append(control2.getY())
                           .append('\n');
                    return builder.toString();
                }
    
                final Point2D control1, control2; 
            }
    
            SubPath(Point2D.Float start)
            {
                this.start = start;
                currentPoint = start;
            }
    
            public Point2D getStart()
            {
                return start;
            }
    
            void lineTo(float x, float y)
            {
                Point2D.Float end = new Point2D.Float(x, y);
                segments.add(new Line(currentPoint, end));
                currentPoint = end;
            }
    
            void curveTo(float x1, float y1, float x2, float y2, float x3, float y3)
            {
                Point2D.Float control1 = new Point2D.Float(x1, y1);
                Point2D.Float control2 = new Point2D.Float(x2, y2);
                Point2D.Float end = new Point2D.Float(x3, y3);
                segments.add(new Curve(currentPoint, control1, control2, end));
                currentPoint = end;
            }
    
            void closePath()
            {
                closed = true;
                currentPoint = start;
            }
    
            //
            // Iterable<Segment> implementation
            //
            public Iterator<Segment> iterator()
            {
                return segments.iterator();
            }
    
            //
            // Object override
            //
            @Override
            public String toString()
            {
                StringBuilder builder = new StringBuilder();
                builder.append("  {\n    Start at: ")
                       .append(start.getX())
                       .append(", ")
                       .append(start.getY())
                       .append('\n');
                for (Segment segment : segments)
                    builder.append(segment);
                if (closed)
                    builder.append("    Closed\n");
                builder.append("  }\n");
                return builder.toString();
            }
    
            boolean closed = false;
            final Point2D.Float start;
            final List<Segment> segments = new ArrayList<Path.Segment>();
        }
    
        public class Rectangle extends SubPath
        {
            Rectangle(Point2D.Float p0, Point2D.Float p1, Point2D.Float p2, Point2D.Float p3)
            {
                super(p0);
                lineTo((float)p1.getX(), (float)p1.getY());
                lineTo((float)p2.getX(), (float)p2.getY());
                lineTo((float)p3.getX(), (float)p3.getY());
                closePath();
            }
    
            //
            // Object override
            //
            @Override
            public String toString()
            {
                StringBuilder builder = new StringBuilder();
                builder.append("  {\n    Rectangle\n    Start at: ")
                       .append(start.getX())
                       .append(", ")
                       .append(start.getY())
                       .append('\n');
                for (Segment segment : segments)
                    builder.append(segment);
                if (closed)
                    builder.append("    Closed\n");
                builder.append("  }\n");
                return builder.toString();
            }
        }
    
        public int getWindingRule()
        {
            return windingRule;
        }
    
        void complete(int windingRule)
        {
            finishSubPath();
            this.windingRule = windingRule;
        }
    
        void appendRectangle(Point2D.Float p0, Point2D.Float p1, Point2D.Float p2, Point2D.Float p3) throws IOException
        {
            finishSubPath();
            currentSubPath = new Rectangle(p0, p1, p2, p3);
            finishSubPath();
        }
    
        void moveTo(float x, float y) throws IOException
        {
            finishSubPath();
            currentSubPath = new SubPath(new Point2D.Float(x, y));
        }
    
        void lineTo(float x, float y) throws IOException
        {
            currentSubPath.lineTo(x, y);
        }
    
        void curveTo(float x1, float y1, float x2, float y2, float x3, float y3) throws IOException
        {
            currentSubPath.curveTo(x1, y1, x2, y2, x3, y3);
        }
    
        Point2D.Float getCurrentPoint() throws IOException
        {
            return currentPoint;
        }
    
        void closePath() throws IOException
        {
            currentSubPath.closePath();
            finishSubPath();
        }
    
        void finishSubPath()
        {
            if (currentSubPath != null)
            {
                subPaths.add(currentSubPath);
                currentSubPath = null;
            }
        }
    
        //
        // Iterable<Path.SubPath> implementation
        //
        public Iterator<SubPath> iterator()
        {
            return subPaths.iterator();
        }
    
        //
        // Object override
        //
        @Override
        public String toString()
        {
            StringBuilder builder = new StringBuilder();
            builder.append("{\n  Winding: ")
                   .append(windingRule)
                   .append('\n');
            for (SubPath subPath : subPaths)
                builder.append(subPath);
            builder.append("}\n");
            return builder.toString();
        }
    
        Point2D.Float currentPoint = null;
        SubPath currentSubPath = null;
        int windingRule = -1;
        final List<SubPath> subPaths = new ArrayList<Path.SubPath>();
    }
    

    ClipPathFinder的用法如下:

    PDDocument document = PDDocument.load(PDFRESOURCE, null);
    PDPage page = document.getPage(PAGENUMBER);
    ClipPathFinder finder = new ClipPathFinder(page);
    finder.findClipPaths();
    
    for (Path path : finder)
    {
        System.out.println(path);
    }
    
    document.close();
    

    这篇关于如何使用pdfbox获取页面内容的高度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆