是否有一个支持文本选择的简约PDF.js示例? [英] Is there a minimalistic PDF.js sample that supports text selection?

查看:41
本文介绍了是否有一个支持文本选择的简约PDF.js示例?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试 PDF.js .

我的问题是 Hello World演示不支持文本选择.它将在没有文本层的画布上绘制所有内容. PDF.js官方演示确实支持文本选择,但是代码太复杂了.我想知道是否有人在文本图层上有一个简约的演示.

解决方案

我已将该示例提交到Mozilla的pdf.js存储库中,并且可以在 examples 目录下找到.

我致力于pdf.js的原始示例不再存在,但我相信它此处是一个小提琴,向您展示了如何在启用文本选择的情况下加载PDF.

弄清楚这一点的困难在于,文本选择逻辑与查看器代码( viewer.js viewer.html viewer)交织在一起.css ).我必须解压缩相关的代码和CSS才能使其正常工作(该文件中引用了JavaScript文件;您也可以将其检出此处).最终结果是一个最小的演示,应该证明是有用的.为了正确实现选择, viewer.css 中的CSS也非常重要,因为它为最终创建并随后用于获取

div 的CSS样式设置样式文本选择有效.

繁重的工作由 TextLayerBuilder 对象完成,该对象实际上处理选择 div s的创建.您可以在 viewer.js 中查看对该对象的调用.

无论如何,这是包含CSS的代码.请记住,您仍然需要 pdf.js 文件.我的小提琴有一个链接,该链接指向我从Mozilla的GitHub存储库为 pdf.js 构建的版本.我不想直接链接到仓库的版本,因为他们一直在开发它,并且它可能已损坏.

因此,事不宜迟:

HTML:

 < html>< head>< title>最小pdf.js文本选择演示</title></head><身体>< div id ="pdfContainer" class ="pdf-content"></div></body></html> 

CSS:

  .pdf-content {边框:1px实线#000000;}/* TextLayerBuilder用于设置文本图层div样式的CSS类*//*这个东西很重要!否则,当您选择文本时,div中的文本将显示!*/:: selection {background:rgba(0,0,255,0.3);}::-moz-selection {background:rgba(0,0,255,0.3);}.textLayer {位置:绝对;左:0;最高:0;正确:0;底部:0;颜色:#000;字体家族:sans-serif;溢出:隐藏;}.textLayer>div {颜色:透明;位置:绝对;行高:1;空白:pre;光标:文本;}.textLayer .highlight {边距:-1px;填充:1px;background-color:rgba(180,0,170,0.2);border-radius:4px;}.textLayer .highlight.begin {border-radius:4px 0px 0px 4px;}.textLayer .highlight.end {border-radius:0px 4px 4px 0px;}.textLayer .highlight.middle {border-radius:0px;}.textLayer .highlight.selected {background-color:rgba(0,100,0,0.2);} 

JavaScript:

 //Vivin Suresh Paliath(http://vivin.net)使用pdf.js的最小PDF呈现和文本选择示例//此小提琴使用的是pdf.js的内置版本,其中包含它需要的所有模块.////出于演示目的,不会从外部来源获取PDF数据.我将要//将其存储在变量中.Mozilla的查看器确实支持PDF上传,但是我还没有真正通过//该代码.还有其他上传PDF数据的方法.例如,我有一个Spring应用程序,它接受一个//上传PDF文件,然后将二进制数据作为base64传送回页面.然后我将其转换//手动插入Uint8Array.我将在这里演示相同的技术.这里最重要的是//我们如何在启用文本选择的情况下呈现PDF.PDF的来源并不重要.只是假设//我们将数据作为base64.////了解文本选择的问题在于文本选择代码已交织在一起//with viewer.html和viewer.js.我已将我需要的部分从viewer.js中提取到一个单独的文件中//其中包含实现文本选择所需的最低要求.关键组件是TextLayerBuilder,//这是处理创建文本选择div的对象.我已将此代码添加为外部代码//资源.////此演示使用仅一页的PDF.您可以根据需要渲染其他页面,但是这里的重点是//只是向您展示如何使用文本选择来呈现PDF.因此,该代码仅加载一页.////此处使用的CSS也非常重要,因为它为文本层divs覆盖设置了CSS,//您实际上最终选择了.////作为参考,可以在以下位置获得呈现的实际PDF文档://http://vivin.net/pub/pdfjs/TestDocument.pdfvar pdfBase64 ="...";//应包含表示PDF的base64var scale = 1;//将此设置为所需的任何值.这基本上是PDF的缩放"因素./***将base64字符串转换为Uint8Array*/函数base64ToUint8Array(base64){var raw = atob(base64);//这是一个本地函数,用于解码base64编码的字符串.var uint8Array = new Uint8Array(new ArrayBuffer(raw.length));for(var i = 0; i  

I'm trying PDF.js.

My problem is that the Hello World demo does not support text selection. It will draw everything in a canvas without the text layer. The official PDF.js demo does support text selection but the code is too complex. I was wondering if somebody has a minimalistic demo with the text layer.

解决方案

I have committed the example to Mozilla's pdf.js repository and it is available under the examples directory.

The original example that I committed to pdf.js no longer exists, but I believe it this example showcases text-selection. They have cleaned up and reorganized pdf.js and so the text-selection logic is encapsulated inside the text-layer, which can be created using a factory.

Specifically, PDFJS.DefaultTextLayerFactory takes care of setting up the basic text-selection stuff.


The following example is outdated; only leaving it here for historical reasons.

I have been struggling with this problem for 2-3 days now, but I finally figured it out. Here is a fiddle that shows you how to load a PDF with text-selection enabled.

The difficulty in figuring this out was that the text-selection logic was intertwined with the viewer code (viewer.js, viewer.html, viewer.css). I had to extricate relevant code and CSS out to get this to work (that JavaScript file is referenced in the file; you can also check it out here). The end result is a minimal demo that should prove helpful. To implement selection properly, the CSS that is in viewer.css is also extremely important as it sets up CSS styles for the divs that are eventually created and then used to get text selection working.

The heavy lifting is done by the TextLayerBuilder object, which actually handles the creation of the selection divs. You can see calls to this object from within viewer.js.

Anyway, here's the code including the CSS. Keep in mind that you will still need the pdf.js file. My fiddle has a link to a version that I built from Mozilla's GitHub repo for pdf.js. I didn't want to link to the repo's version directly since they are constantly developing it and it may be broken.

So without further ado:

HTML:

<html>
    <head>
        <title>Minimal pdf.js text-selection demo</title>
    </head>

    <body>
        <div id="pdfContainer" class = "pdf-content">
        </div>
    </body>
</html>

CSS:

.pdf-content {
    border: 1px solid #000000;
}

/* CSS classes used by TextLayerBuilder to style the text layer divs */

/* This stuff is important! Otherwise when you select the text, the text in the divs will show up! */
::selection { background:rgba(0,0,255,0.3); }
::-moz-selection { background:rgba(0,0,255,0.3); }

.textLayer {
    position: absolute;
    left: 0;
    top: 0;
    right: 0;
    bottom: 0;
    color: #000;
    font-family: sans-serif;
    overflow: hidden;
}

.textLayer > div {
    color: transparent;
    position: absolute;
    line-height: 1;
    white-space: pre;
    cursor: text;
}

.textLayer .highlight {
    margin: -1px;
    padding: 1px;

    background-color: rgba(180, 0, 170, 0.2);
    border-radius: 4px;
}

.textLayer .highlight.begin {
    border-radius: 4px 0px 0px 4px;
}

.textLayer .highlight.end {
    border-radius: 0px 4px 4px 0px;
}

.textLayer .highlight.middle {
    border-radius: 0px;
}

.textLayer .highlight.selected {
    background-color: rgba(0, 100, 0, 0.2);
}

JavaScript:

//Minimal PDF rendering and text-selection example using pdf.js by Vivin Suresh Paliath (http://vivin.net)
//This fiddle uses a built version of pdf.js that contains all modules that it requires.
//
//For demonstration purposes, the PDF data is not going to be obtained from an outside source. I will be
//storing it in a variable. Mozilla's viewer does support PDF uploads but I haven't really gone through
//that code. There are other ways to upload PDF data. For instance, I have a Spring app that accepts a
//PDF for upload and then communicates the binary data back to the page as base64. I then convert this
//into a Uint8Array manually. I will be demonstrating the same technique here. What matters most here is
//how we render the PDF with text-selection enabled. The source of the PDF is not important; just assume
//that we have the data as base64.
//
//The problem with understanding text selection was that the text selection code has heavily intertwined
//with viewer.html and viewer.js. I have extracted the parts I need out of viewer.js into a separate file
//which contains the bare minimum required to implement text selection. The key component is TextLayerBuilder,
//which is the object that handles the creation of text-selection divs. I have added this code as an external
//resource.
//
//This demo uses a PDF that only has one page. You can render other pages if you wish, but the focus here is
//just to show you how you can render a PDF with text selection. Hence the code only loads up one page.
//
//The CSS used here is also very important since it sets up the CSS for the text layer divs overlays that
//you actually end up selecting. 
//
//For reference, the actual PDF document that is rendered is available at:
//http://vivin.net/pub/pdfjs/TestDocument.pdf

var pdfBase64 = "..."; //should contain base64 representing the PDF

var scale = 1; //Set this to whatever you want. This is basically the "zoom" factor for the PDF.

/**
 * Converts a base64 string into a Uint8Array
 */
function base64ToUint8Array(base64) {
    var raw = atob(base64); //This is a native function that decodes a base64-encoded string.
    var uint8Array = new Uint8Array(new ArrayBuffer(raw.length));
    for(var i = 0; i < raw.length; i++) {
        uint8Array[i] = raw.charCodeAt(i);
    }

    return uint8Array;
}

function loadPdf(pdfData) {
    PDFJS.disableWorker = true; //Not using web workers. Not disabling results in an error. This line is
                                //missing in the example code for rendering a pdf.

    var pdf = PDFJS.getDocument(pdfData);
    pdf.then(renderPdf);                               
}

function renderPdf(pdf) {
    pdf.getPage(1).then(renderPage);
}

function renderPage(page) {
    var viewport = page.getViewport(scale);
    var $canvas = jQuery("<canvas></canvas>");

    //Set the canvas height and width to the height and width of the viewport
    var canvas = $canvas.get(0);
    var context = canvas.getContext("2d");
    canvas.height = viewport.height;
    canvas.width = viewport.width;

    //Append the canvas to the pdf container div
    jQuery("#pdfContainer").append($canvas);

    //The following few lines of code set up scaling on the context if we are on a HiDPI display
    var outputScale = getOutputScale();
    if (outputScale.scaled) {
        var cssScale = 'scale(' + (1 / outputScale.sx) + ', ' +
            (1 / outputScale.sy) + ')';
        CustomStyle.setProp('transform', canvas, cssScale);
        CustomStyle.setProp('transformOrigin', canvas, '0% 0%');

        if ($textLayerDiv.get(0)) {
            CustomStyle.setProp('transform', $textLayerDiv.get(0), cssScale);
            CustomStyle.setProp('transformOrigin', $textLayerDiv.get(0), '0% 0%');
        }
    }

    context._scaleX = outputScale.sx;
    context._scaleY = outputScale.sy;
    if (outputScale.scaled) {
        context.scale(outputScale.sx, outputScale.sy);
    }     

    var canvasOffset = $canvas.offset();
    var $textLayerDiv = jQuery("<div />")
        .addClass("textLayer")
        .css("height", viewport.height + "px")
        .css("width", viewport.width + "px")
        .offset({
            top: canvasOffset.top,
            left: canvasOffset.left
        });

    jQuery("#pdfContainer").append($textLayerDiv);

    page.getTextContent().then(function(textContent) {
        var textLayer = new TextLayerBuilder($textLayerDiv.get(0), 0); //The second zero is an index identifying
                                                                       //the page. It is set to page.number - 1.
        textLayer.setTextContent(textContent);

        var renderContext = {
            canvasContext: context,
            viewport: viewport,
            textLayer: textLayer
        };

        page.render(renderContext);
    });
}

var pdfData = base64ToUint8Array(pdfBase64);
loadPdf(pdfData);    

这篇关于是否有一个支持文本选择的简约PDF.js示例?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆