使用pdfbox创建新的自定义COSBase对象? [英] Create a New custom COSBase objects with pdfbox?

查看:402
本文介绍了使用pdfbox创建新的自定义COSBase对象?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们可以创建一个新的自定义PDFOperator(例如PDFOperator {BDC})和COSBase对象(例如COSName {P} COSName {Prop1}(再次,Prop1将引用一个obj))吗?并将它们添加到pdf的根结构中吗?

Can we Create a new custom PDFOperator (like PDFOperator{BDC}) and COSBase objects(like COSName{P} COSName{Prop1} (again Prop1 will reference one more obj)) ? And add these to the root structure of a pdf?

我已经从现有的pdf文档中读取了一些解析器标记列表。我想标记pdf。在该过程中,我将首先使用新创建的COSBase对象操作令牌列表。最后,我将它们添加到根树结构中。因此,在这里如何创建COSBase对象。我正在使用从pdf提取令牌的代码是

I have read some list of parser tokens from an existing pdf document. I wanted to tag the pdf. In that process I will first manipulate the list of tokens with newly created COSBase objects. At last I will add them to root tree structure. So here how can I create a COSBase objects. I am using the code to extract tokens from pdf is

old_document = PDDocument.load(new File(inputPdfFile));
List<Object> newTokens = new ArrayList<>();
for (PDPage page : old_document.getPages()) 
{
    PDFStreamParser parser = new PDFStreamParser(page);
    parser.parse();
    List<Object> tokens = parser.getTokens();
    for (Object token : tokens) {
        System.out.println(token);
        if (token instanceof Operator) {
            Operator op = (Operator) token;     
        }
}
newTokens.add(token);
}

PDStream newContents = new PDStream(document);
document.addPage(page);
OutputStream out = newContents.createOutputStream(COSName.FLATE_DECODE);
ContentStreamWriter writer = new ContentStreamWriter(out);
writer.writeTokens(newTokens);
out.close();
page.setContents(newContents);
document.save(outputPdfFile);
document.close();

以上代码将创建一个具有所有格式和图像的新pdf文件。
因此,在newTokens列表中包含所有现有的COSBase对象,因此我想使用一些标记COSBase对象进行操作,如果我保存了新文档,则应该对其进行标记而无需进行任何解码,编码,字体和图像处理。

Above code will create a new pdf with all formats and images. So In newTokens list contains all existing COSBase objects so I wanted to manipulate with some tagging COSBase objects and if I saved the new document then it should be tagged without taking care of any decode, encode, fonts and image handlings.

首先这个想法可行吗?如果是,那么请帮助我编写一些代码来创建自定义COSBase对象。我对Java非常陌生。

First Is this idea will work? If yes then help me with some code to create custom COSBase objects. I am very new to java.

推荐答案

根据您的文档格式,您可以插入标记的内容。

Based on your document format you can insert marked content.

//Below code is to add   "/p <<MCID 0>> /BDC"

newTokens.add(COSName.getPDFName("P"));
currentMarkedContentDictionary = new COSDictionary();
currentMarkedContentDictionary.setInt(COSName.MCID, mcid);
mcid++;
newTokens.add(currentMarkedContentDictionary);
newTokens.add(Operator.getOperator("BDC"));

// After adding mcid you have to append your existing tokens TJ , TD, Td, T* ....
newTokens.add(existing_token);
// Closed EMC
newTokens.add(Operator.getOperator("EMC"));
//Adding marked content to the root tree structure.
structureElement = new PDStructureElement(StandardStructureTypes.P, currentSection);
structureElement.setPage(page);
PDMarkedContent markedContent = new PDMarkedContent(COSName.P, currentMarkedContentDictionary);
structureElement.appendKid(markedContent);
currentSection.appendKid(structureElement);

感谢@Tilman Hausherr

Thanks to @Tilman Hausherr

这篇关于使用pdfbox创建新的自定义COSBase对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆