在iText中操作路径,颜色等 [英] Manipulate paths, color etc. in iText

查看:458
本文介绍了在iText中操作路径,颜色等的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要分析PDF文件的路径数据并使用iText 7操作内容。操作包括删除/替换和着色。



我可以用某些东西分析图形如下代码:

  public class ContentParsing {
public static void main(String [] args)throws IOException {
new ContentParsing()。inspectPdf(testdata / test.pdf);
}

public void inspectPdf(String path)throws IOException {
File file = new File(path);
PdfDocument pdf = new PdfDocument(new PdfReader(file.getAbsolutePath()));
PdfDocumentContentParser parser = new PdfDocumentContentParser(pdf);
for(int i = 1; i< = pdf.getNumberOfPages(); i ++){
parser.processContent(i,new PathEventListener());
}
pdf.close();
}
}


公共类PathEventListener实现IEventListener {
public void eventOccurred(IEventData eventData,EventType eventType){
PathRenderInfo pathRenderInfo = (PathRenderInfo)eventData;
for(子路径子路径:pathRenderInfo.getPath()。getSubpaths()){
for(IShape segment:subpath.getSegments()){
//这里有一些路径分析代码
System.out.println(segment.getBasePoints());
}
}
}

public Set< EventType> getSupportedEvents(){
Set< EventType> supportedEvents = new HashSet< EventType>();
supportedEvents.add(EventType.RENDER_PATH);
return supportedEvents;
}
}

现在,操作事物的方法是什么把它们写回PDF?我是否必须构建一个全新的PDF文档并复制所有内容(以操作形式),或者我可以以某种方式直接操作读取的PDF数据?

解决方案


现在,操作事物并将它们写回PDF的方法是什么?我是否必须构建一个全新的PDF文档并复制所有内容(以操作形式),或者我可以以某种方式直接操作读取的PDF数据?


本质上,您正在寻找一个不仅仅是解析PDF内容流并在其中发出指令信号的类,如 PdfCanvasProcessor PdfDocumentContentParser 你使用的只是 PdfCanvasProcessor 的一个非常薄的包装器,但它也会根据你转发给它的指令重新创建内容流。



通用内容流编辑器类



用于iText 5.5.xa此类内容的概念验证流编辑器类可以在这个答案中找到(Java版本在答案文本中更进一步)。

这是iText 7概念验证的一个端口:

 公共类PdfCanvasEditor扩展s PdfCanvasProcessor 
{
/ **
*此方法编辑页面的直接内容,即其内容流。
*它明确地不会下降到形式xobjects,模式或注释。
* /
public void editPage(PdfDocument pdfDocument,int pageNumber)抛出IOException
{
if((pdfDocument.getReader()== null)||(pdfDocument.getWriter( )== null))
{
抛出新的PdfException(必须以标记模式打开PdfDocument。);
}

PdfPage page = pdfDocument.getPage(pageNumber);
PdfResources pdfResources = page.getResources();
PdfCanvas pdfCanvas = new PdfCanvas(new PdfStream(),pdfResources,pdfDocument);
editContent(page.getContentBytes(),pdfResources,pdfCanvas);
page.put(PdfName.Contents,pdfCanvas.getContentStream());
}

/ **
*此方法处理内容字节并输出到给定画布。
*它明确地不会下降到形式xobjects,模式或注释。
* /
public void editContent(byte [] contentBytes,PdfResources resources,PdfCanvas canvas)
{
this.canvas = canvas;
processContent(contentBytes,resources);
this.canvas = null;
}

/ **
*< p>
*此方法将内容流操作写入目标画布。默认的
*实现在它们到来时写入它们,因此它基本上生成相同的
*原始指令的副本{@link ContentOperatorWrapper}实例
*转发给它。
*< / p>
*< p>
*重写此方法以实现一些奇特的编辑效果。
*< / p>
* /
protected void write(PdfCanvasProcessor processor,PdfLiteral operator,List< PdfObject> operands)
{
PdfOutputStream pdfOutputStream = canvas.getContentStream()。getOutputStream();
int index = 0;

for(PdfObject object:operands)
{
pdfOutputStream.write(object);
if(operands.size()> ++ index)
pdfOutputStream.writeSpace();
else
pdfOutputStream.writeNewLine();
}
}

//
//构造函数给父母一个虚拟监听器与
//
公共PdfCanvasEditor()交谈
{
super(new DummyEventListener());
}

//
//覆盖PdfContentStreamProcessor方法
//
@Override
public IContentOperator registerContentOperator(String operatorString,IContentOperator operator )
{
ContentOperatorWrapper wrapper = new ContentOperatorWrapper();
wrapper.setOriginalOperator(operator);
IContentOperator formerOperator = super.registerContentOperator(operatorString,wrapper);
返回previousOperator instanceof ContentOperatorWrapper? ((ContentOperatorWrapper)formerOperator).getOriginalOperator():formerOperator;
}

//
//持有输出画布和资源的成员
//
protected PdfCanvas canvas = null;

//
//用于包装所有内容运算符以将调用转发给编辑器的内容运算符类
//
类ContentOperatorWrapper实现IContentOperator
{
public IContentOperator getOriginalOperator()
{
return originalOperator;
}

public void setOriginalOperator(IContentOperator originalOperator)
{
this.originalOperator = originalOperator;
}

@Override
public void invoke(PdfCanvasProcessor processor,PdfLiteral operator,List< PdfObject> operands)
{
if(originalOperator!= null) &&!Do.equals(operator.toString()))
{
originalOperator.invoke(processor,operator,operands);
}
写(处理器,运算符,操作数);
}

private IContentOperator originalOperator = null;
}

//
//一个虚拟事件监听器,用于为底层画布处理器提供事件以将事件提供给
//
静态类DummyEventListener实现IEventListener
{
@Override
public void eventOccurred(IEventData data,EventType type)
{}

@Override
public Set< EventType> ; getSupportedEvents()
{
返回null;
}
}
}

PdfCanvasEditor.java



来自的解释iText 5答案仍然适用,解析框架从iText 5.5.x到iText 7.0.x没有太大变化。



用法示例



不幸的是,你用非常模糊的术语写了关于你想要改变内容的确切方式。因此,我简单地移植了一些使用原始iText 5内容流编辑器类的iText 5样本:



水印删除



这些是此答案中用例的端口。



testRemoveBoldMTTextDocument



此示例删除以字体结尾的所有文本,其名称以BoldMT结尾:

  try(InputStream resource = getClass()。getResourceAsStream(document.pdf); 
PdfReader pdfReader = new PdfReader(resource);
OutputStream result = new FileOutputStream(new File(RESULT_FOLDER,document-noBoldMTText.pdf));
PdfWriter pdfWriter = new PdfWriter(result);
PdfDocument pdfDocument = new PdfDocument(pdfReader,pdfWriter))
{
PdfCanvasEditor editor = new PdfCanvasEditor()
{

@Override
protected void write(PdfCanvasPr ocessor处理器,PdfLiteral运算符,List< PdfObject> operands)
{
String operatorString = operator.toString();

if(TEXT_SHOWING_OPERATORS.contains(operatorString))
{
if(getGraphicsState()。getFont()。getFontProgram()。getFontNames()。getFontName()。endsWith( BoldMT))
返回;
}

super.write(处理器,运算符,操作数);
}

final List< String> TEXT_SHOWING_OPERATORS = Arrays.asList(Tj,',\,TJ);
};
for(int i = 1; i< = pdfDocument.getNumberOfPages( ); i ++)
{
editor.editPage(pdfDocument,i);
}
}

EditPageContent.java 测试方法 testRemoveBoldMTTextDocument



testRemoveBigTextDocument



此示例删除所有使用大字号写入的文本:

  try(InputStream resource = getClass()。getResourceAsStream(document.pdf); 
PdfReader pdfReader = new PdfReader(resource);
OutputStream result = new FileOutputStream(new File(RESULT_FOLDER,document-noBigText.pdf));
PdfWriter pdfWriter = new PdfWriter(result);
PdfDoc ument pdfDocument = new PdfDocument(pdfReader,pdfWriter))
{
PdfCanvasEditor editor = new PdfCanvasEditor()
{

@Override
protected void write( PdfCanvasProcessor处理器,PdfLiteral运算符,List< PdfObject> operands)
{
String operatorString = operator.toString();

if(TEXT_SHOWING_OPERATORS.contains(operatorString))
{
if(getGraphicsState()。getFontSize()> 100)
return;
}

super.write(处理器,运算符,操作数);
}

final List< String> TEXT_SHOWING_OPERATORS = Arrays.asList(Tj,',\,TJ);
};
for(int i = 1; i< = pdfDocument.getNumberOfPages( ); i ++)
{
editor.editPage(pdfDocument,i);
}
}

EditPageContent.java 测试方法 testRemoveBigTextDocument



文本颜色更改



这是这个答案



testChangeBlackTextToGreenDocument



此示例将黑色文本的颜色更改为绿色。

  try(InputStream resource = getClass( ).getResourceAsStream(document.pdf); 
PdfReader pdfReader = new PdfReader(resource);
OutputSt ream result = new FileOutputStream(new File(RESULT_FOLDER,document-blackTextToGreen.pdf));
PdfWriter pdfWriter = new PdfWriter(result);
PdfDocument pdfDocument = new PdfDocument(pdfReader,pdfWriter))
{
PdfCanvasEditor editor = new PdfCanvasEditor()
{

@Override
protected void write(PdfCanvasProcessor processor,PdfLiteral operator,List< PdfObject> operands)
{
String operatorString = operator.toString();

if(TEXT_SHOWING_OPERATORS.contains(operatorString))
{
if(currentReplacedBlack == null)
{
Color currentFillColor = getGraphicsState()。getFillColor ();
if(Color.BLACK.equals(currentFillColor))
{
CurrentlyReplacedBlack = currentFillColor;
super.write(processor,new PdfLiteral(rg),Arrays.asList(new PdfNumber(0),new PdfNumber(1),new PdfNumber(0),new PdfLiteral(rg)));
}
}
}
else if(currentReplacedBlack!= null)
{
if(currentReplacedBlack instanceof DeviceCmyk)
{
super.write(处理器,新PdfLiteral(k),Arrays.asList(new PdfNumber(0),new PdfNumber(0),new PdfNumber(0),new PdfNumber(1),new PdfLiteral(k) ));
}
else if(currentReplacedBlack instanceof DeviceGray)
{
super.write(processor,new PdfLiteral(g),Arrays.asList(new PdfNumber(0),new PdfLiteral( G)));
}
else
{
super.write(processor,new PdfLiteral(rg),Arrays.asList(new PdfNumber(0),new PdfNumber(0),new PdfNumber(0),new PdfLiteral(rg)));
}
CurrentlyReplacedBlack = null;
}

super.write(处理器,运算符,操作数);
}

Color CurrentlyReplacedBlack = null;

final List< String> TEXT_SHOWING_OPERATORS = Arrays.asList(Tj,',\,TJ);
};
for(int i = 1; i< = pdfDocument.getNumberOfPages( ); i ++)
{
editor.editPage(pdfDocument,i);
}
}

EditPageContent.java 测试方法 testChangeBlackTextToGreenDocument


I need to analyze path data of PDF files and manipulate content with iText 7. Manipulations include deletion/replacemant and coloring.

I can analyze the graphics alright with something like the following code:

public class ContentParsing {
    public static void main(String[] args) throws IOException {
        new ContentParsing().inspectPdf("testdata/test.pdf");
    }

    public void inspectPdf(String path) throws IOException {
        File file = new File(path);
        PdfDocument pdf = new PdfDocument(new PdfReader(file.getAbsolutePath()));
        PdfDocumentContentParser parser = new PdfDocumentContentParser(pdf);
        for (int i=1; i<=pdf.getNumberOfPages(); i++) {
            parser.processContent(i, new PathEventListener());
        }
        pdf.close();
    }
}


public class PathEventListener implements IEventListener {
    public void eventOccurred(IEventData eventData, EventType eventType) {
        PathRenderInfo pathRenderInfo = (PathRenderInfo) eventData;
        for ( Subpath subpath : pathRenderInfo.getPath().getSubpaths() ) {
            for ( IShape segment : subpath.getSegments() ) {
                // Here goes some path analysis code
                System.out.println(segment.getBasePoints());
            }
        }
    }

    public Set<EventType> getSupportedEvents() {
        Set<EventType> supportedEvents = new HashSet<EventType>();
        supportedEvents.add(EventType.RENDER_PATH);
        return supportedEvents;
    }
}

Now, what's the way to go with manipulating things and writing them back to the PDF? Do I have to construct an entirely new PDF document and copy everything over (in manipulated form), or can I somehow manipulate the read PDF data directly?

解决方案

Now, what's the way to go with manipulating things and writing them back to the PDF? Do I have to construct an entirely new PDF document and copy everything over (in manipulated form), or can I somehow manipulate the read PDF data directly?

In essence you are looking for a class which is not merely parsing a PDF content stream and signaling the instructions in it like the PdfCanvasProcessor (the PdfDocumentContentParser you use is merely a very thin wrapper for PdfCanvasProcessor) but which also creates the content stream anew with the instructions you forward back to it.

A generic content stream editor class

For iText 5.5.x a proof-of-concept for such a content stream editor class can be found in this answer (the Java version is a bit further down in the answer text).

This is a port of that proof-of-concept to iText 7:

public class PdfCanvasEditor extends PdfCanvasProcessor
{
    /**
     * This method edits the immediate contents of a page, i.e. its content stream.
     * It explicitly does not descent into form xobjects, patterns, or annotations.
     */
    public void editPage(PdfDocument pdfDocument, int pageNumber) throws IOException
    {
        if ((pdfDocument.getReader() == null) || (pdfDocument.getWriter() == null))
        {
            throw new PdfException("PdfDocument must be opened in stamping mode.");
        }

        PdfPage page = pdfDocument.getPage(pageNumber);
        PdfResources pdfResources = page.getResources();
        PdfCanvas pdfCanvas = new PdfCanvas(new PdfStream(), pdfResources, pdfDocument);
        editContent(page.getContentBytes(), pdfResources, pdfCanvas);
        page.put(PdfName.Contents, pdfCanvas.getContentStream());
    }

    /**
     * This method processes the content bytes and outputs to the given canvas.
     * It explicitly does not descent into form xobjects, patterns, or annotations.
     */
    public void editContent(byte[] contentBytes, PdfResources resources, PdfCanvas canvas)
    {
        this.canvas = canvas;
        processContent(contentBytes, resources);
        this.canvas = null;
    }

    /**
     * <p>
     * This method writes content stream operations to the target canvas. The default
     * implementation writes them as they come, so it essentially generates identical
     * copies of the original instructions the {@link ContentOperatorWrapper} instances
     * forward to it.
     * </p>
     * <p>
     * Override this method to achieve some fancy editing effect.
     * </p> 
     */
    protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
    {
        PdfOutputStream pdfOutputStream = canvas.getContentStream().getOutputStream();
        int index = 0;

        for (PdfObject object : operands)
        {
            pdfOutputStream.write(object);
            if (operands.size() > ++index)
                pdfOutputStream.writeSpace();
            else
                pdfOutputStream.writeNewLine();
        }
    }

    //
    // constructor giving the parent a dummy listener to talk to 
    //
    public PdfCanvasEditor()
    {
        super(new DummyEventListener());
    }

    //
    // Overrides of PdfContentStreamProcessor methods
    //
    @Override
    public IContentOperator registerContentOperator(String operatorString, IContentOperator operator)
    {
        ContentOperatorWrapper wrapper = new ContentOperatorWrapper();
        wrapper.setOriginalOperator(operator);
        IContentOperator formerOperator = super.registerContentOperator(operatorString, wrapper);
        return formerOperator instanceof ContentOperatorWrapper ? ((ContentOperatorWrapper)formerOperator).getOriginalOperator() : formerOperator;
    }

    //
    // members holding the output canvas and the resources
    //
    protected PdfCanvas canvas = null;

    //
    // A content operator class to wrap all content operators to forward the invocation to the editor
    //
    class ContentOperatorWrapper implements IContentOperator
    {
        public IContentOperator getOriginalOperator()
        {
            return originalOperator;
        }

        public void setOriginalOperator(IContentOperator originalOperator)
        {
            this.originalOperator = originalOperator;
        }

        @Override
        public void invoke(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
        {
            if (originalOperator != null && !"Do".equals(operator.toString()))
            {
                originalOperator.invoke(processor, operator, operands);
            }
            write(processor, operator, operands);
        }

        private IContentOperator originalOperator = null;
    }

    //
    // A dummy event listener to give to the underlying canvas processor to feed events to
    //
    static class DummyEventListener implements IEventListener
    {
        @Override
        public void eventOccurred(IEventData data, EventType type)
        { }

        @Override
        public Set<EventType> getSupportedEvents()
        {
            return null;
        }
    }
}

(PdfCanvasEditor.java)

The explanations from the iText 5 answer still apply, the parsing framework has not changed much from iText 5.5.x to iText 7.0.x.

Usage examples

Unfortunately you wrote in very vague terms about how exactly you want to change the contents. Thus I simply ported some iText 5 samples which made use of the original iText 5 content stream editor class:

Watermark removal

These are ports of the use cases in this answer.

testRemoveBoldMTTextDocument

This example drops all text written in a font the name of which ends with "BoldMT":

try (   InputStream resource = getClass().getResourceAsStream("document.pdf");
        PdfReader pdfReader = new PdfReader(resource);
        OutputStream result = new FileOutputStream(new File(RESULT_FOLDER, "document-noBoldMTText.pdf"));
        PdfWriter pdfWriter = new PdfWriter(result);
        PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
    PdfCanvasEditor editor = new PdfCanvasEditor()
    {

        @Override
        protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
        {
            String operatorString = operator.toString();

            if (TEXT_SHOWING_OPERATORS.contains(operatorString))
            {
                if (getGraphicsState().getFont().getFontProgram().getFontNames().getFontName().endsWith("BoldMT"))
                    return;
            }

            super.write(processor, operator, operands);
        }

        final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
    };
    for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
    {
        editor.editPage(pdfDocument, i);
    }
}

(EditPageContent.java test method testRemoveBoldMTTextDocument)

testRemoveBigTextDocument

This example drops all text written with a large font size:

try (   InputStream resource = getClass().getResourceAsStream("document.pdf");
        PdfReader pdfReader = new PdfReader(resource);
        OutputStream result = new FileOutputStream(new File(RESULT_FOLDER, "document-noBigText.pdf"));
        PdfWriter pdfWriter = new PdfWriter(result);
        PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
    PdfCanvasEditor editor = new PdfCanvasEditor()
    {

        @Override
        protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
        {
            String operatorString = operator.toString();

            if (TEXT_SHOWING_OPERATORS.contains(operatorString))
            {
                if (getGraphicsState().getFontSize() > 100)
                    return;
            }

            super.write(processor, operator, operands);
        }

        final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
    };
    for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
    {
        editor.editPage(pdfDocument, i);
    }
}

(EditPageContent.java test method testRemoveBigTextDocument)

Text color change

This is a port of the use case in this answer.

testChangeBlackTextToGreenDocument

This example changes the color of black text to green.

try (   InputStream resource = getClass().getResourceAsStream("document.pdf");
        PdfReader pdfReader = new PdfReader(resource);
        OutputStream result = new FileOutputStream(new File(RESULT_FOLDER, "document-blackTextToGreen.pdf"));
        PdfWriter pdfWriter = new PdfWriter(result);
        PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
    PdfCanvasEditor editor = new PdfCanvasEditor()
    {

        @Override
        protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
        {
            String operatorString = operator.toString();

            if (TEXT_SHOWING_OPERATORS.contains(operatorString))
            {
                if (currentlyReplacedBlack == null)
                {
                    Color currentFillColor = getGraphicsState().getFillColor();
                    if (Color.BLACK.equals(currentFillColor))
                    {
                        currentlyReplacedBlack = currentFillColor;
                        super.write(processor, new PdfLiteral("rg"), Arrays.asList(new PdfNumber(0), new PdfNumber(1), new PdfNumber(0), new PdfLiteral("rg")));
                    }
                }
            }
            else if (currentlyReplacedBlack != null)
            {
                if (currentlyReplacedBlack instanceof DeviceCmyk)
                {
                    super.write(processor, new PdfLiteral("k"), Arrays.asList(new PdfNumber(0), new PdfNumber(0), new PdfNumber(0), new PdfNumber(1), new PdfLiteral("k")));
                }
                else if (currentlyReplacedBlack instanceof DeviceGray)
                {
                    super.write(processor, new PdfLiteral("g"), Arrays.asList(new PdfNumber(0), new PdfLiteral("g")));
                }
                else
                {
                    super.write(processor, new PdfLiteral("rg"), Arrays.asList(new PdfNumber(0), new PdfNumber(0), new PdfNumber(0), new PdfLiteral("rg")));
                }
                currentlyReplacedBlack = null;
            }

            super.write(processor, operator, operands);
        }

        Color currentlyReplacedBlack = null;

        final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
    };
    for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
    {
        editor.editPage(pdfDocument, i);
    }
}

(EditPageContent.java test method testChangeBlackTextToGreenDocument)

这篇关于在iText中操作路径,颜色等的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆