phantomjs pdf到stdout [英] phantomjs pdf to stdout
问题描述
我正拼命尝试将phantomJS生成的PDF输出到此处
我得到的是一个空的PDF文件,尽管它的大小不是0,但显示的是空白页.
var page = require('webpage').create(),
system = require('system'),
address;
address = system.args[1];
page.paperSize = {format: 'A4'};
page.open(address, function (status) {
if (status !== 'success') {
console.log('Unable to load the address!');
phantom.exit();
} else {
window.setTimeout(function () {
page.render('/dev/stdout', { format: 'pdf' });
phantom.exit();
}, 1000);
}
});
我这样称呼它:phantomjs rasterize.js http://google.com>test.pdf
我尝试将/dev/stdout
更改为system.stdout
,但是没有运气.直接将PDF写入文件可以正常工作.
我正在寻找一种跨平台的实现,因此希望在非Linux系统上可以实现.
在Windows上将输出写入/dev/stdout/
或/dev/stderr/
时,PhantomJS
经历以下步骤(如 \ phantomjs \ src \ webpage.cpp ):
- 在没有
/dev/stdout/
和/dev/stderr/
的情况下,会分配一个临时文件路径. - 使用临时文件路径调用
renderPdf
. - 将网页呈现到此文件路径.
- 将此文件的内容读入
QByteArray
. - 在字节数组上调用
QString::fromAscii
并写入stdout
或stderr
. - 删除临时文件.
首先,我为PhantomJS
构建了源代码,但注释掉了文件删除.在下一次运行中,我能够检查它已呈现的临时文件,事实证明这是完全可以的.我还尝试运行具有相同结果的phantomjs.exe rasterize.js http://google.com > test.png
.这就立即排除了呈现问题,或与PDF相关的任何事情,这意味着该问题必须与将数据写入stdout
的方式有关.
到这个阶段,我怀疑是否存在一些编码恶作剧的文本.在以前的运行中,我拥有同一个文件的有效版本和无效版本(在这种情况下为PNG).
使用一些C#代码,我运行了以下实验:
//Read the contents of the known good file.
byte[] bytesFromGoodFile = File.ReadAllBytes("valid_file.png");
//Read the contents of the known bad file.
byte[] bytesFromBadFile = File.ReadAllBytes("invalid_file.png");
//Take the bytes from the valid file and convert to a string
//using the Latin-1 encoding.
string iso88591String = Encoding.GetEncoding("iso-8859-1").GetString(bytesFromGoodFile);
//Take the Latin-1 encoded string and retrieve its bytes using the UTF-8 encoding.
byte[] bytesFromIso88591String = Encoding.UTF8.GetBytes(iso88591String);
//If the bytes from the Latin-1 string are all the same as the ones from the
//known bad file, we have an encoding problem.
Debug.Assert(bytesFromBadFile
.Select((b, i) => b == bytesFromIso88591String[i])
.All(c => c));
请注意,我将ISO-8859-1编码用作QT
,并将其用作默认c字符串的编码.事实证明,所有这些字节都是相同的.该练习的目的是看我是否可以模仿导致有效数据变为无效的编码步骤.
为了获得更多证据,我调查了 \ phantomjs \ src \ system.cpp , \ phantomjs \ src \ filesystem.cpp .
- 在
system.cpp
中,System
类保留对stdout
,stdin
和stderr
的File
对象的引用,这些对象已设置为使用UTF-8
编码. /li> - 写入
stdout
时,将调用File
对象的write
函数.该功能支持写入文本文件和二进制文件,但是由于System
类初始化它们的方式,所有写入将被视为将要写入文本文件.
因此问题归结为:我们需要对stdout
执行二进制写操作,但最终我们的写操作被视为文本,并且对其应用了编码,导致生成的文件无效. /p>
鉴于上述问题,在不更改PhantomJS
代码的情况下,我看不到任何方法可以在Windows上以所需的方式运行.所以他们在这里:
第一个更改将提供我们可以调用File
对象以显式执行二进制写入的功能.
在\phantomjs\src\filesystem.h
中添加以下函数原型:
bool binaryWrite(const QString &data);
并将其定义放在\phantomjs\src\filesystem.cpp
中(此方法的代码来自此文件中的write
方法):
bool File::binaryWrite(const QString &data)
{
if ( !m_file->isWritable() ) {
qDebug() << "File::write - " << "Couldn't write:" << m_file->fileName();
return true;
}
QByteArray bytes(data.size(), Qt::Uninitialized);
for(int i = 0; i < data.size(); ++i) {
bytes[i] = data.at(i).toAscii();
}
return m_file->write(bytes);
}
在\phantomjs\src\webpage.cpp
的第920行附近,您将看到一个代码块,如下所示:
if( fileName == STDOUT_FILENAME ){
#ifdef Q_OS_WIN32
_setmode(_fileno(stdout), O_BINARY);
#endif
((File *)system->_stderr())->write(QString::fromAscii(name.constData(), name.size()));
#ifdef Q_OS_WIN32
_setmode(_fileno(stdout), O_TEXT);
#endif
}
将其更改为此:
if( fileName == STDOUT_FILENAME ){
#ifdef Q_OS_WIN32
_setmode(_fileno(stdout), O_BINARY);
((File *)system->_stdout())->binaryWrite(QString::fromAscii(ba.constData(), ba.size()));
#elif
((File *)system->_stderr())->write(QString::fromAscii(name.constData(), name.size()));
#endif
#ifdef Q_OS_WIN32
_setmode(_fileno(stdout), O_TEXT);
#endif
}
因此,代码替换所做的是调用我们新的binaryWrite
函数,但由#ifdef Q_OS_WIN32
块进行了保护.我这样做是为了在似乎没有出现此问题的非Windows系统上保留旧功能(或者是?).请注意,此修复仅适用于写入stdout
-如果您愿意,可以始终将其应用于stderr
,但在那种情况下可能没什么关系.
如果您只想要一个预构建的二进制文件(谁不想?),则可以在我的此处的说明进行操作,所以应该没问题
I am desperately trying to output a PDF generated by phantomJS to stdout like here
What I am getting is an empty PDF file, although it is not 0 in size, it displays a blank page.
var page = require('webpage').create(),
system = require('system'),
address;
address = system.args[1];
page.paperSize = {format: 'A4'};
page.open(address, function (status) {
if (status !== 'success') {
console.log('Unable to load the address!');
phantom.exit();
} else {
window.setTimeout(function () {
page.render('/dev/stdout', { format: 'pdf' });
phantom.exit();
}, 1000);
}
});
And I call it like so: phantomjs rasterize.js http://google.com>test.pdf
I tried changing /dev/stdout
to system.stdout
but not luck. Writing PDF straight to file works without any problems.
I am looking for a cross-platform implementation, so I hope this is achievable on non-linux systems.
When writing output to /dev/stdout/
or /dev/stderr/
on Windows, PhantomJS
goes through the following steps (as seen in the render
method in \phantomjs\src\webpage.cpp):
- In absence of
/dev/stdout/
and/dev/stderr/
a temporary file path is allocated. - Call
renderPdf
with the temporary file path. - Render the web page to this file path.
- Read the contents of this file into a
QByteArray
. - Call
QString::fromAscii
on the byte array and write tostdout
orstderr
. - Delete the temporary file.
To begin with, I built the source for PhantomJS
, but commented out the file deletion. On the next run, I was able to examine the temporary file it had rendered, which turned out to be completely fine. I also tried running phantomjs.exe rasterize.js http://google.com > test.png
with the same results. This immediately ruled out a rendering issue, or anything specifically to do with PDFs, meaning that the problem had to be related to the way data is written to stdout
.
By this stage I had suspicions about whether there was some text encoding shenanigans going on. From previous runs, I had both a valid and invalid version of the same file (a PNG in this case).
Using some C# code, I ran the following experiment:
//Read the contents of the known good file.
byte[] bytesFromGoodFile = File.ReadAllBytes("valid_file.png");
//Read the contents of the known bad file.
byte[] bytesFromBadFile = File.ReadAllBytes("invalid_file.png");
//Take the bytes from the valid file and convert to a string
//using the Latin-1 encoding.
string iso88591String = Encoding.GetEncoding("iso-8859-1").GetString(bytesFromGoodFile);
//Take the Latin-1 encoded string and retrieve its bytes using the UTF-8 encoding.
byte[] bytesFromIso88591String = Encoding.UTF8.GetBytes(iso88591String);
//If the bytes from the Latin-1 string are all the same as the ones from the
//known bad file, we have an encoding problem.
Debug.Assert(bytesFromBadFile
.Select((b, i) => b == bytesFromIso88591String[i])
.All(c => c));
Note that I used ISO-8859-1 encoding as QT
uses this as the default encoding for c-strings. As it turned out, all those bytes were the same. The point of that exercise was to see if I could mimic the encoding steps that caused valid data to become invalid.
For further evidence, I investigated \phantomjs\src\system.cpp and \phantomjs\src\filesystem.cpp.
- In
system.cpp
, theSystem
class holds references to, among other things,File
objects forstdout
,stdin
andstderr
, which are set up to useUTF-8
encoding. - When writing to
stdout
, thewrite
function of theFile
object is called. This function supports writing to both text and binary files, but because of the way theSystem
class initializes them, all writing will be treated as though it were going to a text file.
So the problem boils down to this: we need to be performing a binary write to stdout
, yet our writes end up being treated as text and having an encoding applied to them that causes the resulting file to be invalid.
Given the problem described above, I can't see any way to get this working the way you want on Windows without making changes to the PhantomJS
code. So here they are:
This first change will provide a function we can call on File
objects to explicitly perform a binary write.
Add the following function prototype in \phantomjs\src\filesystem.h
:
bool binaryWrite(const QString &data);
And place its definition in \phantomjs\src\filesystem.cpp
(the code for this method comes from the write
method in this file):
bool File::binaryWrite(const QString &data)
{
if ( !m_file->isWritable() ) {
qDebug() << "File::write - " << "Couldn't write:" << m_file->fileName();
return true;
}
QByteArray bytes(data.size(), Qt::Uninitialized);
for(int i = 0; i < data.size(); ++i) {
bytes[i] = data.at(i).toAscii();
}
return m_file->write(bytes);
}
At around line 920 of \phantomjs\src\webpage.cpp
you'll see a block of code that looks like this:
if( fileName == STDOUT_FILENAME ){
#ifdef Q_OS_WIN32
_setmode(_fileno(stdout), O_BINARY);
#endif
((File *)system->_stderr())->write(QString::fromAscii(name.constData(), name.size()));
#ifdef Q_OS_WIN32
_setmode(_fileno(stdout), O_TEXT);
#endif
}
Change it to this:
if( fileName == STDOUT_FILENAME ){
#ifdef Q_OS_WIN32
_setmode(_fileno(stdout), O_BINARY);
((File *)system->_stdout())->binaryWrite(QString::fromAscii(ba.constData(), ba.size()));
#elif
((File *)system->_stderr())->write(QString::fromAscii(name.constData(), name.size()));
#endif
#ifdef Q_OS_WIN32
_setmode(_fileno(stdout), O_TEXT);
#endif
}
So what that code replacement does is calls our new binaryWrite
function, but does so guarded by a #ifdef Q_OS_WIN32
block. I did it this way so as to preserve the old functionality on non-Windows systems which don't seem to exhibit this problem (or do they?). Note that this fix only applies to writing to stdout
- if you want to you could always apply it to stderr
but it may not matter quite so much in that case.
In case you just want a pre-built binary (who wouldn't?), you can find phantomjs.exe
with these fixes on my SkyDrive. My version is around 19MB whereas the one I downloaded earlier was only about 6MB, though I followed the instructions here, so it should be fine.
这篇关于phantomjs pdf到stdout的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!