用PDFBox标记PDF [英] Tagged PDF with PDFBox
问题描述
是否可以使用PDFBox创建标记的PDF(PDF / UA)?看来PDFBox具有用于该API的API(包 org.apache.pdfbox.pdmodel.documentinterchange.taggedpdf
),但我找不到任何教程或代码示例。 / p>
使用下面的代码,我生成了一个包含图像的PDF文件,并且屏幕阅读器NVDA(在我的情况下)识别出该图像,并读取了 ...图形替代说明 。但是,可访问性检查器 PAC 2 显示错误:未标记图像对象。
PDDocument doc = new PDDocument();
PDPage页面= new PDPage();
doc.addPage(page);
PDDocumentCatalog documentCatalog = doc.getDocumentCatalog();
PDImageXObject pdImage = PDImageXObject.createFromFile(imagePath,doc);
PDPageContentStream的内容=新的PDPageContentStream(doc,page);
contents.drawImage(pdImage,100,600,pdImage.getWidth()/ 2,pdImage.getHeight()/ 2);
contents.close();
PDStructureTreeRoot treeRoot =新的PDStructureTreeRoot();
PDStructureElement structureElement =新的PDStructureElement(StandardStructureTypes.Figure,treeRoot);
structureElement.setPage(page);
PDMarkedContentmarkedImg = new PDMarkedContent(COSName.IMAGE,new COSDictionary());
markedImg.addXObject(pdImage);
structureElement.appendKid(markedImg);
structureElement.setAlternateDescription( Alternate Description);
treeRoot.appendKid(structureElement);
documentCatalog.setStructureTreeRoot(treeRoot);
// ....
doc.save(fileName);
您能提供有关此主题的一些解释或代码示例吗?
我提出了一个工作示例,该示例演示了如何使用PDFBox 2创建可访问的PDF:
> https://github.com/martinlovell/accessible-pdfbox-example
问题代码中缺少一些东西。标记的内容需要替换文字,我相信您需要该标记内容的mcid。
该示例项目更加详细地演示了您所需要的内容。
是这样的:
PDPageContentStream contents = new PDPageContentStream(doc,page);
//流中的内容需要一个id
int mcid = 5;
COSDictionary字典= new COSDictionary();
字典= new COSDictionary();
字典(COSName.MCID,mcid);
//将图像绘图包装在标记的内容中
contents.beginMarkedContent(COSName.IMAGE,PDPropertyList.create(dictionary));
contents.drawImage(pdImage,100,600,pdImage.getWidth()/ 2,pdImage.getHeight()/ 2);
contents.endMarkedContent();
contents.close();
PDStructureTreeRoot treeRoot =新的PDStructureTreeRoot();
documentCatalog.setStructureTreeRoot(treeRoot);
PDStructureElement structureElement =新的PDStructureElement(StandardStructureTypes.Figure,treeRoot);
structureElement.setPage(page);
structureElement.setAlternateDescription( Alternate Description);
//在结构的标记内容上设置替代文本。
//这是在beginMarkedContent中使用的带有mcid的字典。
dictionary.setString(COSName.ALT,替代说明);
PDMarkedContentmarkedImg = new PDMarkedContent(COSName.IMAGE,dictionary);
markedImg.addXObject(pdImage);
structureElement.appendKid(markedImg);
Is it possible to create tagged PDF(PDF/UA) with PDFBox? It looks like PDFBox has an API for that (package org.apache.pdfbox.pdmodel.documentinterchange.taggedpdf
), but I can't find any tutorials or code examples.
Using the code below, I generated a PDF file containing an image, and the screen reader NVDA (in my case) recognizes it and reads '... graphic Alternate Description'. However, the accessibility checker PAC 2 shows an error: 'Image object not tagged'.
PDDocument doc = new PDDocument();
PDPage page = new PDPage();
doc.addPage(page);
PDDocumentCatalog documentCatalog = doc.getDocumentCatalog();
PDImageXObject pdImage = PDImageXObject.createFromFile(imagePath, doc);
PDPageContentStream contents = new PDPageContentStream(doc, page);
contents.drawImage(pdImage, 100, 600, pdImage.getWidth() / 2, pdImage.getHeight() / 2);
contents.close();
PDStructureTreeRoot treeRoot = new PDStructureTreeRoot();
PDStructureElement structureElement = new PDStructureElement(StandardStructureTypes.Figure, treeRoot);
structureElement.setPage(page);
PDMarkedContent markedImg = new PDMarkedContent(COSName.IMAGE, new COSDictionary());
markedImg.addXObject(pdImage);
structureElement.appendKid(markedImg);
structureElement.setAlternateDescription("Alternate Description");
treeRoot.appendKid(structureElement);
documentCatalog.setStructureTreeRoot(treeRoot);
// ....
doc.save(fileName);
Can you provide some explanations or/and code examples about this subject?
I put up a working example which demonstrates creating an accessible PDF using PDFBox 2: https://github.com/martinlovell/accessible-pdfbox-example
There are a few things missing from the code in the question. The marked content needs alt text, and I believe you need mcids for that marked content.
The example project demonstrates in more detail what you need.
It would be something like this:
PDPageContentStream contents = new PDPageContentStream(doc, page);
// the content in the stream needs an id
int mcid = 5;
COSDictionary dictionary = new COSDictionary();
dictionary = new COSDictionary();
dictionary(COSName.MCID, mcid);
// wrap image drawing in marked content
contents.beginMarkedContent(COSName.IMAGE, PDPropertyList.create(dictionary));
contents.drawImage(pdImage, 100, 600, pdImage.getWidth() / 2, pdImage.getHeight() / 2);
contents.endMarkedContent();
contents.close();
PDStructureTreeRoot treeRoot = new PDStructureTreeRoot();
documentCatalog.setStructureTreeRoot(treeRoot);
PDStructureElement structureElement = new PDStructureElement(StandardStructureTypes.Figure, treeRoot);
structureElement.setPage(page);
structureElement.setAlternateDescription("Alternate Description");
// Set alt text on marked content for structure.
// This is the dictionary with the mcid used in beginMarkedContent.
dictionary.setString(COSName.ALT, "Alternate Description");
PDMarkedContent markedImg = new PDMarkedContent(COSName.IMAGE, dictionary);
markedImg.addXObject(pdImage);
structureElement.appendKid(markedImg);
这篇关于用PDFBox标记PDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!