如何使用NodeJS替换PDF文件中的字符串? [英] How do I replace a string in a PDF file using NodeJS?

查看:187
本文介绍了如何使用NodeJS替换PDF文件中的字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个模板PDF文件,我想替换一些标记字符串以生成新的PDF文件并保存.最好/最简单的方法是什么?我不需要添加图形或任何花哨的东西,只需简单的文本替换即可,所以我不需要任何复杂的东西.

I have a template PDF file, and I want to replace some marker strings to generate new PDF files and save them. What's the best/simplest way to do this? I don't need to add graphics or anything fancy, just a simple text replacement, so I don't want anything too complicated.

谢谢!

刚刚找到 HummusJS ,我将看看是否可以取得进展并将其发布在这里.

Just found HummusJS, I'll see if I can make progress and post it here.

推荐答案

我通过搜索找到了这个问题,所以我认为它值得回答.我在这里找到了BrighTide的答案: https://github.com/galkahana/HummusJS/issues/71#issuecomment-275956347

I found this question by searching, so I think it deserves the answer. I found the answer by BrighTide here: https://github.com/galkahana/HummusJS/issues/71#issuecomment-275956347

基本上,有一个非常强大的Hummus软件包,它使用C ++编写的库(当然是跨平台的).我认为可以在github注释中给出的答案可以像这样进行功能化:

Basically, there is this very powerful Hummus package which uses library written in C++ (crossplatform of course). I think the answer given in that github comment can be functionalized like this:

var hummus = require('hummus');

/**
 * Returns a byteArray string
 * 
 * @param {string} str - input string
 */
function strToByteArray(str) {
  var myBuffer = [];
  var buffer = new Buffer(str);
  for (var i = 0; i < buffer.length; i++) {
      myBuffer.push(buffer[i]);
  }
  return myBuffer;
}

function replaceText(sourceFile, targetFile, pageNumber, findText, replaceText) {  
    var writer = hummus.createWriterToModify(sourceFile, {
        modifiedFilePath: targetFile
    });
    var modifier = new hummus.PDFPageModifier(writer, pageNumber);
    var sourceParser = writer.createPDFCopyingContextForModifiedFile().getSourceDocumentParser();
    var pageObject = sourceParser.parsePage(pageNumber);
    var textObjectId = pageObject.getDictionary().toJSObject().Contents.getObjectID();
    var textStream = sourceParser.queryDictionaryObject(pageObject.getDictionary(), 'Contents');
    //read the original block of text data
    var data = [];
    var readStream = sourceParser.startReadingFromStream(textStream);
    while(readStream.notEnded()){
        Array.prototype.push.apply(data, readStream.read(10000));
    }
    var string = new Buffer(data).toString().replace(findText, replaceText);

    //Create and write our new text object
    var objectsContext = writer.getObjectsContext();
    objectsContext.startModifiedIndirectObject(textObjectId);

    var stream = objectsContext.startUnfilteredPDFStream();
    stream.getWriteStream().write(strToByteArray(string));
    objectsContext.endPDFStream(stream);

    objectsContext.endIndirectObject();

    writer.end();
}

// replaceText('source.pdf', 'output.pdf', 0, /REPLACEME/g, 'My New Custom Text');

更新:
编写示例时使用的版本为1.0.83,最近可能会有所变化.

UPDATE:
The version used at the time of writing an example was 1.0.83, things might change recently.

更新2: 最近,我遇到另一个字体不同的PDF文件的问题.由于某种原因,文本被分成小块,即字符串QWERTYUIOPASDFGHJKLZXCVBNM1234567890-被表示为-286(Q)9(WER)24(T)-8(YUIOP)116(ASDF)19(GHJKLZX)15(CVBNM1234567890-) 我不知道除了组成正则表达式外还要做些什么.所以代替这一行:

UPDATE 2: Recently I got an issue with another PDF file which had a different font. For some reason the text got split into small chunks, i.e. string QWERTYUIOPASDFGHJKLZXCVBNM1234567890- got represented as -286(Q)9(WER)24(T)-8(YUIOP)116(ASDF)19(GHJKLZX)15(CVBNM1234567890-) I had no idea what else to do rather than make up a regex.. So instead of this one line:

var string = new Buffer(data).toString().replace(findText, replaceText);

我现在有这样的东西:

var string = Buffer.from(data).toString();

var characters = REPLACE_ME;
var match = [];
for (var a = 0; a < characters.length; a++) {
    match.push('(-?[0-9]+)?(\\()?' + characters[a] + '(\\))?');
}

string = string.replace(new RegExp(match.join('')), function(m, m1) {
    // m1 holds the first item which is a space
    return m1 + '( ' + REPLACE_WITH_THIS + ')';
});

这篇关于如何使用NodeJS替换PDF文件中的字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆