使用 PDFBox 标记的 PDF [英] Tagged PDF with PDFBox

查看:31
本文介绍了使用 PDFBox 标记的 PDF的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以使用 PDFBox 创建带标签的 PDF(PDF/UA)?看起来 PDFBox 有一个 API(包 org.apache.pdfbox.pdmodel.documentinterchange.taggedpdf),但我找不到任何教程或代码示例.

Is it possible to create tagged PDF(PDF/UA) with PDFBox? It looks like PDFBox has an API for that (package org.apache.pdfbox.pdmodel.documentinterchange.taggedpdf), but I can't find any tutorials or code examples.

使用下面的代码,我生成了一个包含图像的 PDF 文件,屏幕阅读器 NVDA(在我的例子中)识别它并读取...图形替代描述".但是,可访问性检查器 PAC 2显示错误:未标记图像对象".

Using the code below, I generated a PDF file containing an image, and the screen reader NVDA (in my case) recognizes it and reads '... graphic Alternate Description'. However, the accessibility checker PAC 2 shows an error: 'Image object not tagged'.

        PDDocument doc = new PDDocument();
        PDPage page = new PDPage();
        doc.addPage(page);
        PDDocumentCatalog documentCatalog = doc.getDocumentCatalog();

        PDImageXObject pdImage = PDImageXObject.createFromFile(imagePath, doc);
        PDPageContentStream contents = new PDPageContentStream(doc, page);
        contents.drawImage(pdImage, 100, 600, pdImage.getWidth() / 2, pdImage.getHeight() / 2);
        contents.close();

        PDStructureTreeRoot treeRoot = new PDStructureTreeRoot();
        PDStructureElement structureElement = new PDStructureElement(StandardStructureTypes.Figure, treeRoot);
        structureElement.setPage(page);

        PDMarkedContent markedImg = new PDMarkedContent(COSName.IMAGE, new COSDictionary());
        markedImg.addXObject(pdImage);

        structureElement.appendKid(markedImg);
        structureElement.setAlternateDescription("Alternate Description");
        treeRoot.appendKid(structureElement);
        documentCatalog.setStructureTreeRoot(treeRoot);
        // ....
        doc.save(fileName);

你能提供一些关于这个主题的解释或/和代码示例吗?

Can you provide some explanations or/and code examples about this subject?

推荐答案

我提供了一个工作示例,演示了使用 PDFBox 2 创建可访问的 PDF:https://github.com/martinlovell/accessible-pdfbox-example

I put up a working example which demonstrates creating an accessible PDF using PDFBox 2: https://github.com/martinlovell/accessible-pdfbox-example

问题中的代码缺少一些内容.标记的内容需要替换文字,我相信您需要标记内容的 mcid.

There are a few things missing from the code in the question. The marked content needs alt text, and I believe you need mcids for that marked content.

示例项目更详细地展示了您的需求.

The example project demonstrates in more detail what you need.

应该是这样的:

PDPageContentStream contents = new PDPageContentStream(doc, page);

// the content in the stream needs an id
int mcid = 5;
COSDictionary dictionary = new COSDictionary();
dictionary = new COSDictionary();
dictionary(COSName.MCID, mcid);

// wrap image drawing in marked content
contents.beginMarkedContent(COSName.IMAGE, PDPropertyList.create(dictionary));
contents.drawImage(pdImage, 100, 600, pdImage.getWidth() / 2, pdImage.getHeight() / 2);
contents.endMarkedContent();

contents.close();

PDStructureTreeRoot treeRoot = new PDStructureTreeRoot();
documentCatalog.setStructureTreeRoot(treeRoot);
PDStructureElement structureElement = new PDStructureElement(StandardStructureTypes.Figure, treeRoot);
structureElement.setPage(page);
structureElement.setAlternateDescription("Alternate Description");

// Set alt text on marked content for structure.  
// This is the dictionary with the mcid used in beginMarkedContent.
dictionary.setString(COSName.ALT, "Alternate Description");
PDMarkedContent markedImg = new PDMarkedContent(COSName.IMAGE, dictionary);
markedImg.addXObject(pdImage);
structureElement.appendKid(markedImg);

这篇关于使用 PDFBox 标记的 PDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆