将存储在内存中的字符串传递给pdftotext,antiword,catdoc等 [英] Passing string stored in memory to pdftotext, antiword, catdoc, etc
问题描述
是否可以调用CLI工具(如pdftotext,antiword,catdoc(文本提取器脚本))传递字符串而不是文件?
Is it possible to call CLI tools like pdftotext, antiword, catdoc (text extractor scripts) passing a string instead of a file?
目前,我阅读的
Currently, I read PDF files calling pdftotext with child_process.spawn
. I spawn a new process and store the result in a new variable. Everything works fine.
我想从 fs.readFile
传递 binary
而不是文件本身:
I’d like to pass the binary
from a fs.readFile
instead of the file itself:
fs.readFile('./my.pdf', (error, binary) => {
// Call pdftotext with child_process.spawn passing the binary.
let event = child_process.spawn('pdftotext', [
// Args here!
]);
});
我该怎么做?
推荐答案
如果该命令可以处理管道输入,则绝对有可能.
It's definitely possible, if the command can handle piped input.
spawn returns a ChildProcess object, you can pass the string (or binary) in memory to it by write to its stdin. The string should be converted to a ReadableStream first, then you can write the string to stdin
of the CLI by pipe.
createReadStream 创建一个下面的示例下载pdf文件,并将内容通过管道传递到 pdftotext
,然后显示结果的前几个字节.
The following example download a pdf file and pipe the content to pdftotext
, then show first few bytes of the result.
const source = 'http://static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf'
const http = require('http')
const spawn = require('child_process').spawn
download(source).then(pdftotext)
.then(result => console.log(result.slice(0, 77)))
function download(url) {
return new Promise(resolve => http.get(url, resolve))
}
function pdftotext(binaryStream) {
//read input from stdin and write to stdout
const command = spawn('pdftotext', ['-', '-'])
binaryStream.pipe(command.stdin)
return new Promise(resolve => {
const result = []
command.stdout.on('data', chunk => result.push(chunk.toString()))
command.stdout.on('end', () => resolve(result.join('')))
})
}
由于CLI不能选择从 stdin
中读取,您可以使用命名管道.
For CLIs have no option to read from stdin
, you can use named pipes.
添加另一个带有命名管道的示例.
Add another example with named pipes.
一旦创建了命名管道,就可以像使用文件一样使用它们.下面的示例创建临时的命名管道以发送输入和获取输出,并显示结果的前几个字节.
Once the named pipes are created, you can use them like files. The following example creates temporary named pipes to send input and get output, and show first few bytes of the result.
const fs = require('fs')
const spawn = require('child_process').spawn
pipeCommand({
name: 'wvText',
input: fs.createReadStream('document.doc'),
}).then(result => console.log(result.slice(0, 77)))
function createPipe(name) {
return new Promise(resolve =>
spawn('mkfifo', [name]).on('exit', () => resolve()))
}
function pipeCommand({name, input}) {
const inpipe = 'input.pipe'
const outpipe = 'output.pipe'
return Promise.all([inpipe, outpipe].map(createPipe)).then(() => {
const result = []
fs.createReadStream(outpipe)
.on('data', chunk => result.push(chunk.toString()))
.on('error', console.log)
const command = spawn(name, [inpipe, outpipe]).on('error', console.log)
input.pipe(fs.createWriteStream(inpipe).on('error', console.log))
return new Promise(resolve =>
command.on('exit', () => {
[inpipe, outpipe].forEach(name => fs.unlink(name))
resolve(result.join(''))
}))
})
}
这篇关于将存储在内存中的字符串传递给pdftotext,antiword,catdoc等的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!