用PDFBox标记PDF [英] Tagged PDF with PDFBox

查看:246
本文介绍了用PDFBox标记PDF的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以使用PDFBox创建标记的PDF(PDF / UA)?看来PDFBox具有用于该API的API(包 org.apache.pdfbox.pdmodel.documentinterchange.taggedpdf ),但我找不到任何教程或代码示例。 / p>

使用下面的代码,我生成了一个包含图像的PDF文件,并且屏幕阅读器NVDA(在我的情况下)识别出该图像,并读取了 ...图形替代说明 。但是,可访问性检查器 PAC 2 显示错误:未标记图像对象。

  PDDocument doc = new PDDocument(); 
PDPage页面= new PDPage();
doc.addPage(page);
PDDocumentCatalog documentCatalog = doc.getDocumentCatalog();

PDImageXObject pdImage = PDImageXObject.createFromFile(imagePath,doc);
PDPageContentStream的内容=新的PDPageContentStream(doc,page);
contents.drawImage(pdImage,100,600,pdImage.getWidth()/ 2,pdImage.getHeight()/ 2);
contents.close();

PDStructureTreeRoot treeRoot =新的PDStructureTreeRoot();
PDStructureElement structureElement =新的PDStructureElement(StandardStructureTypes.Figure,treeRoot);
structureElement.setPage(page);

PDMarkedContentmarkedImg = new PDMarkedContent(COSName.IMAGE,new COSDictionary());
markedImg.addXObject(pdImage);

structureElement.appendKid(markedImg);
structureElement.setAlternateDescription( Alternate Description);
treeRoot.appendKid(structureElement);
documentCatalog.setStructureTreeRoot(treeRoot);
// ....
doc.save(fileName);

您能提供有关此主题的一些解释或代码示例吗?

解决方案

我提出了一个工作示例,该示例演示了如何使用PDFBox 2创建可访问的PDF:
> https://github.com/martinlovell/accessible-pdfbox-example



问题代码中缺少一些东西。标记的内容需要替换文字,我相信您需要该标记内容的mcid。



该示例项目更加详细地演示了您所需要的内容。



是这样的:

  PDPageContentStream contents = new PDPageContentStream(doc,page); 

//流中的内容需要一个id
int mcid = 5;
COSDictionary字典= new COSDictionary();
字典= new COSDictionary();
字典(COSName.MCID,mcid);

//将图像绘图包装在标记的内容中
contents.beginMarkedContent(COSName.IMAGE,PDPropertyList.create(dictionary));
contents.drawImage(pdImage,100,600,pdImage.getWidth()/ 2,pdImage.getHeight()/ 2);
contents.endMarkedContent();

contents.close();

PDStructureTreeRoot treeRoot =新的PDStructureTreeRoot();
documentCatalog.setStructureTreeRoot(treeRoot);
PDStructureElement structureElement =新的PDStructureElement(StandardStructureTypes.Figure,treeRoot);
structureElement.setPage(page);
structureElement.setAlternateDescription( Alternate Description);

//在结构的标记内容上设置替代文本。
//这是在beginMarkedContent中使用的带有mcid的字典。
dictionary.setString(COSName.ALT,替代说明);
PDMarkedContentmarkedImg = new PDMarkedContent(COSName.IMAGE,dictionary);
markedImg.addXObject(pdImage);
structureElement.appendKid(markedImg);


Is it possible to create tagged PDF(PDF/UA) with PDFBox? It looks like PDFBox has an API for that (package org.apache.pdfbox.pdmodel.documentinterchange.taggedpdf), but I can't find any tutorials or code examples.

Using the code below, I generated a PDF file containing an image, and the screen reader NVDA (in my case) recognizes it and reads '... graphic Alternate Description'. However, the accessibility checker PAC 2 shows an error: 'Image object not tagged'.

        PDDocument doc = new PDDocument();
        PDPage page = new PDPage();
        doc.addPage(page);
        PDDocumentCatalog documentCatalog = doc.getDocumentCatalog();

        PDImageXObject pdImage = PDImageXObject.createFromFile(imagePath, doc);
        PDPageContentStream contents = new PDPageContentStream(doc, page);
        contents.drawImage(pdImage, 100, 600, pdImage.getWidth() / 2, pdImage.getHeight() / 2);
        contents.close();

        PDStructureTreeRoot treeRoot = new PDStructureTreeRoot();
        PDStructureElement structureElement = new PDStructureElement(StandardStructureTypes.Figure, treeRoot);
        structureElement.setPage(page);

        PDMarkedContent markedImg = new PDMarkedContent(COSName.IMAGE, new COSDictionary());
        markedImg.addXObject(pdImage);

        structureElement.appendKid(markedImg);
        structureElement.setAlternateDescription("Alternate Description");
        treeRoot.appendKid(structureElement);
        documentCatalog.setStructureTreeRoot(treeRoot);
        // ....
        doc.save(fileName);

Can you provide some explanations or/and code examples about this subject?

解决方案

I put up a working example which demonstrates creating an accessible PDF using PDFBox 2: https://github.com/martinlovell/accessible-pdfbox-example

There are a few things missing from the code in the question. The marked content needs alt text, and I believe you need mcids for that marked content.

The example project demonstrates in more detail what you need.

It would be something like this:

PDPageContentStream contents = new PDPageContentStream(doc, page);

// the content in the stream needs an id
int mcid = 5;
COSDictionary dictionary = new COSDictionary();
dictionary = new COSDictionary();
dictionary(COSName.MCID, mcid);

// wrap image drawing in marked content
contents.beginMarkedContent(COSName.IMAGE, PDPropertyList.create(dictionary));
contents.drawImage(pdImage, 100, 600, pdImage.getWidth() / 2, pdImage.getHeight() / 2);
contents.endMarkedContent();

contents.close();

PDStructureTreeRoot treeRoot = new PDStructureTreeRoot();
documentCatalog.setStructureTreeRoot(treeRoot);
PDStructureElement structureElement = new PDStructureElement(StandardStructureTypes.Figure, treeRoot);
structureElement.setPage(page);
structureElement.setAlternateDescription("Alternate Description");

// Set alt text on marked content for structure.  
// This is the dictionary with the mcid used in beginMarkedContent.
dictionary.setString(COSName.ALT, "Alternate Description");
PDMarkedContent markedImg = new PDMarkedContent(COSName.IMAGE, dictionary);
markedImg.addXObject(pdImage);
structureElement.appendKid(markedImg);

这篇关于用PDFBox标记PDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆