如何实现异步加载页面的PDF查看器 [英] How to implement a PDF viewer that loads pages asynchronously

查看:122
本文介绍了如何实现异步加载页面的PDF查看器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们需要允许我们的移动应用程序的用户浏览具有快速,流畅和平台原生体验的杂志(类似于iBooks / Google Books)。

We need to allow users of our mobile app to browse a magazine with an experience that is fast, fluid and feels native to the platform (similar to iBooks/Google Books).

我们需要的一些功能是能够看到整本杂志的缩略图,并搜索特定的文字。

Some featurs we need are being able to see Thumbnails of the whole magazine, and searching for specific text.

问题是我们的杂志超过140页我们不能强迫我们的用户事先完全下载整个电子书/ PDF。我们需要异步加载页面,即让用户开始阅读而不必完全下载内容。

The problem is that our magazines are over 140 pages long and we can’t force our users to have to fully download the whole ebook/PDF beforehand. We need pages to be loaded asynchronously, that is, to let users start reading without having to fully download the content.

我研究了PDFKit for iOS但是我找不到文档中有关异步下载PDF的任何内容。

I studied PDFKit for iOS however I didn’t find any mention in the documentation about downloading a PDF asynchronously.

在iOS和Android上是否有任何解决方案/库可以实现此功能?

Are there any solutions/libraries to implement this functionality on iOS and Android?

推荐答案

你要找的是线性化,并根据这个答案

What you're looking for is called linearization and according to this answer.


%PDF-1.x标题行后面的第一个对象
包含一个字典键,表示/ Linearized属性
文件。

The first object immediately after the %PDF-1.x header line shall contain a dictionary key indicating the /Linearized property of the file.

这个整体结构允许一致的读者非常快速地学习
完整的对象地址列表,而无需
从头到尾下载完整的文件:

This overall structure allows a conforming reader to learn the complete list of object addresses very quickly, without needing to download the complete file from beginning to end:


  • 观众可以非常快速地显示第一页
    完整文件已下载。

  • The viewer can display the first page(s) very fast, before the complete file is downloaded.

用户可以点击缩略图页面预览(或者ToC
中的链接文件)以便在显示
第一页之后立即跳转到第445页,然后查看者可以通过询问远程请求所有
请求第445页所需的对象服务器通过字节
范围请求提供这些乱序,以便查看者可以
更快地显示此页面。 (当用户不按顺序阅读页面时,
下载完整文档仍将在
背景中继续...)

The user can click on a thumbnail page preview (or a link in the ToC of the file) in order to jump to, say, page 445, immediately after the first page(s) have been displayed, and the viewer can then request all the objects required for page 445 by asking the remote server via byte range requests to deliver these "out of order" so the viewer can display this page faster. (While the user reads pages out of order, the downloading of the complete document will still go on in the background...)

您可以使用这个本地人库线性化一个PDF。

You can use this native library to linearization a PDF.

然而
我不建议让它渲染PDF 不会快速,流畅或感觉原生。出于这些原因,据我所知,没有本地移动应用程序可以线性化。此外,您必须为PDF创建自己的渲染引擎,因为大多数PDF查看库不支持线性化。您应该做的是将PDF中的每个单独页面转换为服务器端的HTML,并让客户端仅在需要时加载页面并缓存。我们还将单独保存PDF计划文本以启用搜索。这样一切都将顺利,因为资源将被延迟加载。为了实现这一目标,您可以执行以下操作:

However I wouldn't recommend made it has rendering the PDFs wont be fast, fluid or feel native. For those reasons, as far as I know there is no native mobile app that does linearization. Moreover, you have to create your own rendering engine for the PDF as most PDF viewing libraries do not support linearization . What you should do instead is convert the each individual page in the PDF to HTML on the server end and have the client only load the pages when required and cache. We will also save PDFs plan text separately in order to enable search. This way everything will be smooth as the resources will be lazy loaded. In order to achieve this you can do the following.

首先
在服务器端,每当您发布PDF时,如上所述,PDF的页面应分成HTML文件。还应从这些页面生成页面大拇指。假设你的服务器在 python 上运行,并带有 flask microframework 这就是你要做的。

Firstly On the server end, whenever you publish a PDF, the pages of the PDF should be split into HTML files as explained above. Page thumbs should also be generated from those pages. Assuming that your server is running on python with a flask microframework this is what you do.

from flask import Flask,request
from werkzeug import secure_filename
import os
from pyPdf import PdfFileWriter, PdfFileReader
import imgkit
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.pdfpage import PDFPage
from pdfminer.converter import XMLConverter, HTMLConverter, TextConverter
from pdfminer.layout import LAParams
import io
import sqlite3
import Image

app = Flask(__name__)


@app.route('/publish',methods=['GET','POST'])
def upload_file():
     if request.method == 'POST':
        f = request.files['file']
        filePath = "pdfs/"+secure_filename(f.filename)
        f.save(filePath)
        savePdfText(filePath)
        inputpdf = PdfFileReader(open(filePath, "rb"))

        for i in xrange(inputpdf.numPages):
            output = PdfFileWriter()
            output.addPage(inputpdf.getPage(i))
            with open("document-page%s.pdf" % i, "wb") as outputStream:
                output.write(outputStream)
                imgkit.from_file("document-page%s.pdf" % i, "document-page%s.jpg" % i)
                saveThum("document-page%s.jpg" % i)
                os.system("pdf2htmlEX --zoom 1.3  pdf/"+"document-page%s.pdf" % i) 

    def saveThum(infile):
        save = 124,124
        outfile = os.path.splitext(infile)[0] + ".thumbnail"
        if infile != outfile:
            try:
                im = Image.open(infile)
                im.thumbnail(size, Image.ANTIALIAS)
                im.save(outfile, "JPEG")
            except IOError:
                print("cannot create thumbnail for '%s'" % infile)

    def savePdfText(data):
        fp = open(data, 'rb')
        rsrcmgr = PDFResourceManager()
        retstr = io.StringIO()
        codec = 'utf-8'
        laparams = LAParams()
        device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)
        # Create a PDF interpreter object.
        interpreter = PDFPageInterpreter(rsrcmgr, device)
        # Process each page contained in the document.
        db = sqlite3.connect("pdfText.db")
        cursor = db.cursor()
        cursor.execute('create table if not exists pagesTextTables(id INTEGER PRIMARY KEY,pageNum TEXT,pageText TEXT)')
        db.commit()
        pageNum = 1
        for page in PDFPage.get_pages(fp):
            interpreter.process_page(page)
            data =  retstr.getvalue()
            cursor.execute('INSERT INTO pagesTextTables(pageNum,pageText) values(?,?) ',(str(pageNum),data ))
            db.commit()
            pageNum = pageNum+1

    @app.route('/page',methods=['GET','POST'])
    def getPage():
        if request.method == 'GET':
            page_num = request.files['page_num']
            return send_file("document-page%s.html" % page_num, as_attachment=True)

    @app.route('/thumb',methods=['GET','POST'])
    def getThum():
        if request.method == 'GET':
            page_num = request.files['page_num']
            return send_file("document-page%s.thumbnail" % page_num, as_attachment=True)

    @app.route('/search',methods=['GET','POST'])
    def search():
        if request.method == 'GET':
            query = request.files['query ']       
            db = sqlite3.connect("pdfText.db")
            cursor = db.cursor()
           cursor.execute("SELECT * from pagesTextTables Where pageText LIKE '%"+query +"%'")
           result = cursor.fetchone()
           response = Response()
           response.headers['queryResults'] = result 
           return response

以下是烧瓶应用正在做什么的解释。

Here is an explanation of what the flask app is doing.


  1. / publish 路线负责出版杂志,页面到HTML,将PDF文本保存到SQlite数据库并为这些页面生成缩略图。我使用 pyPDF 将PDF拆分为单个页面, pdfToHtmlEx 将页面转换为HTML, imgkit 为图像生成这些HTML并 PIL 从这些图像生成拇指。此外,一个简单的 Sqlite db 保存页面文本。

  2. / page / thumb / search 路由是不言自明的。他们只返回HTML,拇指或搜索查询结果。

  1. The /publish route is responsible for the publishing of your magazine, turning very page to HTML, saving the PDFs text to an SQlite db and generating thumbnails for those pages. I've used pyPDF for splitting the PDF to individual pages, pdfToHtmlEx to convert the pages to HTML, imgkit to generate those HTML to images and PIL to generate thumbs from those images. Also, a simple Sqlite db saves the pages' text.
  2. The /page, /thumb and /search routes are self explanatory. They simply return the HTML, thumb or search query results.

其次,在客户端,您只需简单每当用户滚动到HTML页面时下载HTML页面。让我举个Android操作系统的例子。首先,您需要创建一些 Utils 来处理 GET 请求者

Secondly, on the client end you simply download the HTML page whenever the user scrolls to it. Let me give you an example for android OS. Firstly, you'd want to Create some Utils to handle the GET requestrs

public static byte[] GetPage(int mPageNum){
return CallServer("page","page_num",Integer.toString(mPageNum))
}

public static byte[] GetThum(int mPageNum){
return CallServer("thumb","page_num",Integer.toString(mPageNum))
}

private  static byte[] CallServer(String route,String requestName,String requestValue) throws IOException{

        OkHttpClient client = new OkHttpClient.Builder().connectTimeout(30, TimeUnit.SECONDS).writeTimeout(30, TimeUnit.SECONDS).readTimeout(30, TimeUnit.SECONDS).build();
        MultipartBody.Builder mMultipartBody = new MultipartBody.Builder().setType(MultipartBody.FORM).addFormDataPart(requestName,requestValue);

        RequestBody mRequestBody = mMultipartBody.build();
        Request request = new Request.Builder()
                .url("yourUrl/"+route).post(mRequestBody)
                .build();
        Response response = client.newCall(request).execute();
        return response.body().bytes();
    }

上面的helper util简单处理对服务器的查询,他们应该自我解释。
接下来,您可以使用WebView viewHolder或者更好的 RecyclerView -AdvancedWebViewrel =nofollow noreferrer>高级webview ,因为它可以为您提供更多的自定义功能。

The helper utils above simple handle the queries to the server for you, they should be self explanatory. Next, you simple create an RecyclerView with a WebView viewHolder or better yet an advanced webview as it will give you more power with customization.

    public static class ViewHolder extends RecyclerView.ViewHolder {
        private AdvancedWebView mWebView;
        public ViewHolder(View itemView) {
            super(itemView);
         mWebView = (AdvancedWebView)itemView;}
    }
    private class ContentAdapter extends RecyclerView.Adapter<YourFrament.ViewHolder>{
        @Override
        public ViewHolder onCreateViewHolder(ViewGroup container, int viewType) {

            return new ViewHolder(new AdvancedWebView(container.getContext()));
        }

        @Override
        public int getItemViewType(int position) {

            return 0;
        }

        @Override
        public void onBindViewHolder( ViewHolder holder, int position) {
handlePageDownload(holder.mWebView);
        }
       private void handlePageDownload(AdvancedWebView mWebView){....}

        @Override
        public int getItemCount() {
            return numberOfPages;
        }
    }

应该是关于它的。

这篇关于如何实现异步加载页面的PDF查看器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆