phantomjs pdf到stdout [英] phantomjs pdf to stdout

查看:77
本文介绍了phantomjs pdf到stdout的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正拼命尝试将phantomJS生成的PDF输出到此处

我得到的是一个空的PDF文件,尽管它的大小不是0,但显示的是空白页.

var page = require('webpage').create(),
system = require('system'),
address;

address = system.args[1];
page.paperSize = {format: 'A4'};

page.open(address, function (status) {
    if (status !== 'success') {
        console.log('Unable to load the address!');
        phantom.exit();
    } else {
        window.setTimeout(function () {
            page.render('/dev/stdout', { format: 'pdf' });
            phantom.exit();
        }, 1000);
    }
});

我这样称呼它:phantomjs rasterize.js http://google.com>test.pdf

我尝试将/dev/stdout更改为system.stdout,但是没有运气.直接将PDF写入文件可以正常工作.

我正在寻找一种跨平台的实现,因此希望在非Linux系统上可以实现.

解决方案

在Windows上将输出写入/dev/stdout//dev/stderr/时,PhantomJS经历以下步骤(如 \ phantomjs \ src \ webpage.cpp ):

  1. 在没有/dev/stdout//dev/stderr/的情况下,会分配一个临时文件路径.
  2. 使用临时文件路径调用renderPdf.
  3. 将网页呈现到此文件路径.
  4. 将此文件的内容读入QByteArray.
  5. 在字节数组上调用QString::fromAscii并写入stdoutstderr.
  6. 删除临时文件.

首先,我为PhantomJS构建了源代码,但注释掉了文件删除.在下一次运行中,我能够检查它已呈现的临时文件,事实证明这是完全可以的.我还尝试运行具有相同结果的phantomjs.exe rasterize.js http://google.com > test.png.这就立即排除了呈现问题,或与PDF相关的任何事情,这意味着该问题必须与将数据写入stdout的方式有关.

到这个阶段,我怀疑是否存在一些编码恶作剧的文本.在以前的运行中,我拥有同一个文件的有效版本和无效版本(在这种情况下为PNG).

使用一些C#代码,我运行了以下实验:

//Read the contents of the known good file.
byte[] bytesFromGoodFile = File.ReadAllBytes("valid_file.png");
//Read the contents of the known bad file.
byte[] bytesFromBadFile = File.ReadAllBytes("invalid_file.png");

//Take the bytes from the valid file and convert to a string
//using the Latin-1 encoding.
string iso88591String = Encoding.GetEncoding("iso-8859-1").GetString(bytesFromGoodFile);
//Take the Latin-1 encoded string and retrieve its bytes using the UTF-8 encoding.
byte[] bytesFromIso88591String = Encoding.UTF8.GetBytes(iso88591String);

//If the bytes from the Latin-1 string are all the same as the ones from the
//known bad file, we have an encoding problem.
Debug.Assert(bytesFromBadFile
    .Select((b, i) => b == bytesFromIso88591String[i])
    .All(c => c));

请注意,我将ISO-8859-1编码用作QT,并将其用作默认c字符串的编码.事实证明,所有这些字节都是相同的.该练习的目的是看我是否可以模仿导致有效数据变为无效的编码步骤.

为了获得更多证据,我调查了 \ phantomjs \ src \ system.cpp \ phantomjs \ src \ filesystem.cpp .

  • system.cpp中,System类保留对stdoutstdinstderrFile对象的引用,这些对象已设置为使用UTF-8编码. /li>
  • 写入stdout时,将调用File对象的write函数.该功能支持写入文本文件和二进制文件,但是由于System类初始化它们的方式,所有写入将被视为将要写入文本文件.

因此问题归结为:我们需要对stdout执行二进制写操作,但最终我们的写操作被视为文本,并且对其应用了编码,导致生成的文件无效. /p>


鉴于上述问题,在不更改PhantomJS代码的情况下,我看不到任何方法可以在Windows上以所需的方式运行.所以他们在这里:

第一个更改将提供我们可以调用File对象以显式执行二进制写入的功能.

\phantomjs\src\filesystem.h中添加以下函数原型:

bool binaryWrite(const QString &data);

并将其定义放在\phantomjs\src\filesystem.cpp中(此方法的代码来自此文件中的write方法):

bool File::binaryWrite(const QString &data)
{
    if ( !m_file->isWritable() ) {
        qDebug() << "File::write - " << "Couldn't write:" << m_file->fileName();
        return true;
    }

    QByteArray bytes(data.size(), Qt::Uninitialized);
    for(int i = 0; i < data.size(); ++i) {
        bytes[i] = data.at(i).toAscii();
    }
    return m_file->write(bytes);
}

\phantomjs\src\webpage.cpp的第920行附近,您将看到一个代码块,如下所示:

    if( fileName == STDOUT_FILENAME ){
#ifdef Q_OS_WIN32
        _setmode(_fileno(stdout), O_BINARY);            
#endif      

        ((File *)system->_stderr())->write(QString::fromAscii(name.constData(), name.size()));

#ifdef Q_OS_WIN32
        _setmode(_fileno(stdout), O_TEXT);
#endif          
    }

将其更改为此:

   if( fileName == STDOUT_FILENAME ){
#ifdef Q_OS_WIN32
        _setmode(_fileno(stdout), O_BINARY);
        ((File *)system->_stdout())->binaryWrite(QString::fromAscii(ba.constData(), ba.size()));
#elif            
        ((File *)system->_stderr())->write(QString::fromAscii(name.constData(), name.size()));
#endif      

#ifdef Q_OS_WIN32
        _setmode(_fileno(stdout), O_TEXT);
#endif          
    }

因此,代码替换所做的是调用我们新的binaryWrite函数,但由#ifdef Q_OS_WIN32块进行了保护.我这样做是为了在似乎没有出现此问题的非Windows系统上保留旧功能(或者是?).请注意,此修复仅适用于写入stdout-如果您愿意,可以始终将其应用于stderr,但在那种情况下可能没什么关系.

如果您只想要一个预构建的二进制文件(谁不想?),则可以在我的此处的说明进行操作,所以应该没问题

I am desperately trying to output a PDF generated by phantomJS to stdout like here

What I am getting is an empty PDF file, although it is not 0 in size, it displays a blank page.

var page = require('webpage').create(),
system = require('system'),
address;

address = system.args[1];
page.paperSize = {format: 'A4'};

page.open(address, function (status) {
    if (status !== 'success') {
        console.log('Unable to load the address!');
        phantom.exit();
    } else {
        window.setTimeout(function () {
            page.render('/dev/stdout', { format: 'pdf' });
            phantom.exit();
        }, 1000);
    }
});

And I call it like so: phantomjs rasterize.js http://google.com>test.pdf

I tried changing /dev/stdout to system.stdout but not luck. Writing PDF straight to file works without any problems.

I am looking for a cross-platform implementation, so I hope this is achievable on non-linux systems.

解决方案

When writing output to /dev/stdout/ or /dev/stderr/ on Windows, PhantomJS goes through the following steps (as seen in the render method in \phantomjs\src\webpage.cpp):

  1. In absence of /dev/stdout/ and /dev/stderr/ a temporary file path is allocated.
  2. Call renderPdf with the temporary file path.
  3. Render the web page to this file path.
  4. Read the contents of this file into a QByteArray.
  5. Call QString::fromAscii on the byte array and write to stdout or stderr.
  6. Delete the temporary file.

To begin with, I built the source for PhantomJS, but commented out the file deletion. On the next run, I was able to examine the temporary file it had rendered, which turned out to be completely fine. I also tried running phantomjs.exe rasterize.js http://google.com > test.png with the same results. This immediately ruled out a rendering issue, or anything specifically to do with PDFs, meaning that the problem had to be related to the way data is written to stdout.

By this stage I had suspicions about whether there was some text encoding shenanigans going on. From previous runs, I had both a valid and invalid version of the same file (a PNG in this case).

Using some C# code, I ran the following experiment:

//Read the contents of the known good file.
byte[] bytesFromGoodFile = File.ReadAllBytes("valid_file.png");
//Read the contents of the known bad file.
byte[] bytesFromBadFile = File.ReadAllBytes("invalid_file.png");

//Take the bytes from the valid file and convert to a string
//using the Latin-1 encoding.
string iso88591String = Encoding.GetEncoding("iso-8859-1").GetString(bytesFromGoodFile);
//Take the Latin-1 encoded string and retrieve its bytes using the UTF-8 encoding.
byte[] bytesFromIso88591String = Encoding.UTF8.GetBytes(iso88591String);

//If the bytes from the Latin-1 string are all the same as the ones from the
//known bad file, we have an encoding problem.
Debug.Assert(bytesFromBadFile
    .Select((b, i) => b == bytesFromIso88591String[i])
    .All(c => c));

Note that I used ISO-8859-1 encoding as QT uses this as the default encoding for c-strings. As it turned out, all those bytes were the same. The point of that exercise was to see if I could mimic the encoding steps that caused valid data to become invalid.

For further evidence, I investigated \phantomjs\src\system.cpp and \phantomjs\src\filesystem.cpp.

  • In system.cpp, the System class holds references to, among other things, File objects for stdout, stdin and stderr, which are set up to use UTF-8 encoding.
  • When writing to stdout, the write function of the File object is called. This function supports writing to both text and binary files, but because of the way the System class initializes them, all writing will be treated as though it were going to a text file.

So the problem boils down to this: we need to be performing a binary write to stdout, yet our writes end up being treated as text and having an encoding applied to them that causes the resulting file to be invalid.


Given the problem described above, I can't see any way to get this working the way you want on Windows without making changes to the PhantomJS code. So here they are:

This first change will provide a function we can call on File objects to explicitly perform a binary write.

Add the following function prototype in \phantomjs\src\filesystem.h:

bool binaryWrite(const QString &data);

And place its definition in \phantomjs\src\filesystem.cpp (the code for this method comes from the write method in this file):

bool File::binaryWrite(const QString &data)
{
    if ( !m_file->isWritable() ) {
        qDebug() << "File::write - " << "Couldn't write:" << m_file->fileName();
        return true;
    }

    QByteArray bytes(data.size(), Qt::Uninitialized);
    for(int i = 0; i < data.size(); ++i) {
        bytes[i] = data.at(i).toAscii();
    }
    return m_file->write(bytes);
}

At around line 920 of \phantomjs\src\webpage.cpp you'll see a block of code that looks like this:

    if( fileName == STDOUT_FILENAME ){
#ifdef Q_OS_WIN32
        _setmode(_fileno(stdout), O_BINARY);            
#endif      

        ((File *)system->_stderr())->write(QString::fromAscii(name.constData(), name.size()));

#ifdef Q_OS_WIN32
        _setmode(_fileno(stdout), O_TEXT);
#endif          
    }

Change it to this:

   if( fileName == STDOUT_FILENAME ){
#ifdef Q_OS_WIN32
        _setmode(_fileno(stdout), O_BINARY);
        ((File *)system->_stdout())->binaryWrite(QString::fromAscii(ba.constData(), ba.size()));
#elif            
        ((File *)system->_stderr())->write(QString::fromAscii(name.constData(), name.size()));
#endif      

#ifdef Q_OS_WIN32
        _setmode(_fileno(stdout), O_TEXT);
#endif          
    }

So what that code replacement does is calls our new binaryWrite function, but does so guarded by a #ifdef Q_OS_WIN32 block. I did it this way so as to preserve the old functionality on non-Windows systems which don't seem to exhibit this problem (or do they?). Note that this fix only applies to writing to stdout - if you want to you could always apply it to stderr but it may not matter quite so much in that case.

In case you just want a pre-built binary (who wouldn't?), you can find phantomjs.exe with these fixes on my SkyDrive. My version is around 19MB whereas the one I downloaded earlier was only about 6MB, though I followed the instructions here, so it should be fine.

这篇关于phantomjs pdf到stdout的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆