如何从ipaper swf下载文件 [英] How to download a Document from ipaper swf

查看:227
本文介绍了如何从ipaper swf下载文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,我正尝试从ipaper的swf链接下载文件。请引导我如何下载这本书
这里是链接到我想要转换为PDF或Word的书,并保存
http://en-gage.kaplan.co.uk/LMS/content/live_content_v2/acca/exam_kits/2014-15/p6_fa2014/iPaper.swf
在这方面的亲切指导,将不胜感激。
Regards,
Muneeb

解决方案

首先浏览器与网络捕捉(在开发人员/工具)。
您应该在不同的位置打开很多页面,而不需要缩放
,然后查看捕获的数据。

您会看到,对于您打开的每个新页面,浏览器要求一个新的文件(或多个文件)。
这意味着每个页面都有一个文件,并且该文件的浏览器正在创建页面的图像。 (通常有一个文件的一个页面,它是一些格式的图片,但我遇到了base64编码的图片和一个图片切成四件)。

所以我们想要下载并保存所有包含书籍页面的文件。

现在,通常有一个一致的模式的文件的地址,并有一些递增的数字(如我们可以看到在捕获数据之间的区别下面的文件),并知道书中的页面数量,我们可以猜测自己的剩余地址,直到本书的结尾(当然下载所有的文件以编程方式在for循环)
和<我们可以在这里停止。

但是有时地址有点难以猜到,或者我们希望这个过程更加自动化。想要以编程方式获得页面的数量和页面的所有地址。

所以我们必须检查浏览器如何知道这些东西。通常浏览器会在开始时下载一些文件,其中一个包含书中的页数(以及可能的地址)。我们只需要检查捕获的数据,并找到该文件来解析它在我们的proram。



最后有安全问题:

某些网站试图以某种方式保护他们的数据(通常使用cookie或http认证)。但如果你的浏览器可以访问数据,你只需要跟踪它是如何做到的,并模仿它。

(如果是cookie,服务器会在某些时候用 Set-Cookie :头文件,可能是你必须登录才能查看这本书,所以你必须跟踪这个过程,通常是通过post messeges和cookies,如果是http认证,你会看到一些东西例如请求标题中的 Authorization:Basic )。



很简单:
(所有的文件名都是相对于主文件目录的: http://en-gage.kaplan.co.uk/LMS/content/live_content_v2/acca/exam_kits/2014-15/p6_fa2014/
有一个manifest.zip文件,其中包含pages.xml文件,其中包含文件数量和链接。我们可以看到每个页面都有一个大拇指,一个小的和一个大的图片,所以我们只需要大的图片。
你只需要一个程序来循环这些地址(从Paper / Pages / 491287 / Zoom.jpg to Paper / Pages / 491968 / Zoom.jpg)。
最后您可以将所有的JPG文件合并到pdf中。


Hi guys I am trying to download a document from a swf link in ipaper

Please guide me on how can I download the book Here is the link to the book which I want to convert to pdf or word and save http://en-gage.kaplan.co.uk/LMS/content/live_content_v2/acca/exam_kits/2014-15/p6_fa2014/iPaper.swf Your kind guidance in this regard would be appreciated. Regards, Muneeb

解决方案

first you open the book in your browser with network capturing (in developer/s tools). you should open many pages at diffrent locations with and without zoom then look in the captured data.
you will see that for each new page you are opening, the browser asks for a new file (or files).
this means that there is a file for each page and with that file your browser is creating the image of the page. (usually there is one file for a page and it is some format of picture but I encountered base64 encoded picture and a picture cut into four pieces).

so we want to download and save all the files that are containing the book's pages.
now, usually there is a consistent pattern to the addresses of the files and there is some incrementing number in it (as we can see in the captured data the difference between following files), and knowing the number of pages in the book we can guess ourselves the remaining addresses till the end of the book (and of course download all the files programmatically in a for loop) and we could stop here.

but sometimes the addresses are bit difficult to guess or we want the process to be more automatic.
anyway we want to get programmatically the number of pages and all the addresses of the pages.
so we have to check how the browser knows that stuff. usually the browser downloads some files at the beginning and one of them contains the number of pages in the book (and potentially their address). we just have to check in the captured data and find that file to parse it in our proram.

at the end there is issue of security:

some websites try to protect their data one way or another (ussually using cookies or http authentication). but if your browser can access the data you just have to track how it does it and mimic it.
(if it is cookies the server will respond at some point with Set-Cookie: header. it could be that you have to log-in to view the book so you have to track also this process. usually it's via post messeges and cookies. if it is http authentication you will see something like Authorization: Basic in the request headers).

in your case the answer is simple: (all the files names are relative to the main file directory: "http://en-gage.kaplan.co.uk/LMS/content/live_content_v2/acca/exam_kits/2014-15/p6_fa2014/") there is a "manifest.zip" file that contains "pages.xml" file which contains the number of files and links to them. we can see that for each page there is a thumb, a small, and a large pictures so we want just the large ones.
you just need a program that will loop those addresses (from Paper/Pages/491287/Zoom.jpg to Paper/Pages/491968/Zoom.jpg).
finally you can merge all the jpg's to pdf.

这篇关于如何从ipaper swf下载文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆