在PHP中将PDF转换为HTML? [英] Convert PDF to HTML in PHP?

查看:243
本文介绍了在PHP中将PDF转换为HTML?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望能够通过PHP将PDF文件转换为HTML文件,但是遇到了一些麻烦.

I want to be able to convert a PDF file to an HTML file via PHP, but am running into some trouble.

我找到了使用 Saaspose ,可让您将PDF转换为HTML文件.但是,这存在一些问题,例如SVG的使用,图像,位置,字体等.

I found a basic way to do this using Saaspose, which lets you convert PDF's to HTML files. There are some problems with this, however, such as the use of SVGs, images, positioning, fonts, etc.

我所需要做的就是能够从PHP文件和与之关联的任何图像中获取文本,然后以线性格式显示该文本,而不是使用绝对定位对其进行格式化.

All I would need is the ability to grab the text from the PHP file and any images associated with it, and then display it in a linear format as opposed to it being formatted with absolute positioning.

我的意思是,如果PDF看起来像这样:

What I mean by this is that if the PDF looks like this:

我想将其转换为单列设计HTML文件.如果有图像,我希望它们也返回.

I'd want to convert it to a single column design HTML file. If there were images, I'd want them returned as well.

这在PHP中可行吗?我知道我可以简单地从PDF文件中获取文本,但是如何获取图像呢?

Is this possible in PHP? I know I can simply grab the text from the PDF file, but what about grabbing images as well?

另一个问题是,我希望所有内容都内联,因为它们是通过单个文件提供给客户端的.目前,我可以通过一些代码在我的设置中进行此操作:

Another problem is that I want everything to be inline, as it's being served to the client in a single file. Currently, I can do this with my setup through some code:

for ($i = 0; $i < $object_number; $i++) {
                $object = $html->find("object")->find("embed")->eq($i);
                $embed = file_get_contents("Output/OutputHtml/" . $object->attr("src"));
                array_push($converted_obj, $embed);
                array_push($original_obj, $object);
            }

            for ($i = 0; $i < $object_number; $i++){
                pq($original_obj[$i])->replaceWith($converted_obj[$i]);
            }

哪个抓取所有SVG文件并内联显示.图像会更容易做到这一点,因为我可以使用base64.

Which grabs all the SVG files and displays them inline. Images would be easier for this, as I could use base64.

推荐答案

1)下载.exe文件并将其解压缩到文件夹中:

1) download and unpack the .exe file to a folder: http://sourceforge.net/projects/pdftohtml/

2)创建一个.php文件,并放置以下代码(假设pdftohtml.exe位于该文件夹内,同时也包含源sample.pdf):

2) create a .php file, and put this code (assuming, that the pdftohtml.exe is inside that folder, and the source sample.pdf too):

<?php
$source_pdf="sample.pdf";
$output_folder="MyFolder";

    if (!file_exists($output_folder)) { mkdir($output_folder, 0777, true);}
$a= passthru("pdftohtml $source_pdf $output_folder/new_file_name",$b);
var_dump($a);
?>

3)输入 MyFolder ,您将看到转换后的文件(取决于页数.)

3) enter MyFolder, and you will see the converted files (depends on the number of pages..)

p.s.我不知道,但是也有很多商业或试用版api.

p.s. i dont know, but there exists many commercial or trial apis too.

这篇关于在PHP中将PDF转换为HTML?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆