Google App脚本:如何将PDF转换为GDOC以获得OCR? [英] Google App Script : how to convert PDF to GDOC in order to get OCR?

查看:77
本文介绍了Google App脚本:如何将PDF转换为GDOC以获得OCR?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一些代码,以搜索我已经拥有的序列号的PDF(gmail),将其保存在云端硬盘中,在其上获取OCR并读取内容.

I'm trying to code something that search for a PDF (gmail) with a serial number I already have, save it in Drive, get OCR on it and read the content.

第一步没有问题,第二步使用以下代码进行管理,但是使用 DocumentApp 打开文档以获取getText()的最后两行不起作用:

No problem with the first step, and the second one is managed with the following code, but the last two lines to open the document with DocumentApp in order to getText(), are not working :

  var serial = "123456789";
  var ret = DriveApp.searchFiles('fullText contains "' + serial + '"');
  if (ret.hasNext()) {
    var file = ret.next();
    var n_blob = Utilities.newBlob(file.getBlob().getDataAsString(), MimeType.PDF);
    n_blob.setName(serial);
    var n_file = DriveApp.createFile(n_blob);
    var rt = DocumentApp.openById(n_file.getId()); **//not working**
    var text = rt.getBody().getText(); **//not working**
  }

我尝试了许多不同的方法,包括基于 Drive.Files.insert()的解决方案,该解决方案不再起作用.

I tried many differents ways, including the solution based on Drive.Files.insert() which is not working anymore..

如果有人有任何想法或建议可以帮助我,我很困在这里?

I'm pretty stuck here, if anyone has any idea or suggestion to help me out?

谢谢

推荐答案

  • 您想要将PDF文件转换为Google文档文件.
      var file = ret.next();
    • file 始终是PDF文件.
      • You want to convert a PDF file to Google Document file.
        • file of var file = ret.next(); is always PDF file.
        • 如果我的理解是正确的,那么这个答案如何?请认为这只是几个可能的答案之一.

          If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.

          • 不幸的是, var n_blob = Utilities.newBlob(file.getBlob().getDataAsString(),MimeType.PDF) var n_file = DriveApp.createFile(n_blob)无法创建Google文档.这样,就会发生错误.
          • Unfortunately, var n_blob = Utilities.newBlob(file.getBlob().getDataAsString(), MimeType.PDF) and var n_file = DriveApp.createFile(n_blob) cannot create Google Document. By this, an error occurs.

          在此模式下, Drive.Files.copy 用于将PDF转换为Google Document.因为在您的问题中,我看到 Drive.Files.insert()不再起作用.

          In this pattern, Drive.Files.copy is used for converting PDF to Google Document. Because in your question, I saw Drive.Files.insert() which is not working anymore.

          请如下修改您的脚本.在运行脚本之前,请在高级Google服务中启用Drive API.

          Please modify your script as follows. Before you run the script, please enable Drive API at Advanced Google services.

          if (ret.hasNext()) {
            var file = ret.next();
            var n_blob = Utilities.newBlob(file.getBlob().getDataAsString(), MimeType.PDF);
            n_blob.setName(serial);
            var n_file = DriveApp.createFile(n_blob);
            var rt = DocumentApp.openById(n_file.getId()); **//not working**
            var text = rt.getBody().getText(); **//not working**
          }
          

          至:

          if (ret.hasNext()) {
            var file = ret.next();
            if (file.getMimeType() === MimeType.PDF) {
              var fileId = Drive.Files.copy({mimeType: MimeType.GOOGLE_DOCS}, file.getId()).id;
              var rt = DocumentApp.openById(fileId);
              var text = rt.getBody().getText();
              Logger.log(text)
            }
          }
          

          模式2:

          我认为 Drive.Files.insert 可能可以使用.因此,在这种模式下,我建议使用 Drive.Files.insert 修改后的脚本.你能测试一下吗?

          Pattern 2:

          I thought that Drive.Files.insert might be able to be used. So in this pattern, I propose the modified script using Drive.Files.insert. Could you please test this?

          请如下修改您的脚本.在运行脚本之前,请在高级Google服务中启用Drive API.

          Please modify your script as follows. Before you run the script, please enable Drive API at Advanced Google services.

          if (ret.hasNext()) {
            var file = ret.next();
            var n_blob = Utilities.newBlob(file.getBlob().getDataAsString(), MimeType.PDF);
            n_blob.setName(serial);
            var n_file = DriveApp.createFile(n_blob);
            var rt = DocumentApp.openById(n_file.getId()); **//not working**
            var text = rt.getBody().getText(); **//not working**
          }
          

          至:

          if (ret.hasNext()) {
            var file = ret.next();
            if (file.getMimeType() === MimeType.PDF) {
              var fileId = Drive.Files.insert({title: serial, mimeType: MimeType.GOOGLE_DOCS}, file.getBlob()).id;
              var rt = DocumentApp.openById(fileId);
              var text = rt.getBody().getText();
              Logger.log(text)
            }
          }
          

          注意:

          • 不幸的是,我不了解 Drive.Files.insert()不再起作用.因此,如果上述修改后的脚本不起作用,请告诉我.我想考虑其他方法.
          • 当您检查日志时,如果看不到从PDF转换的Google Document文本,则表示 var file = ret.next(); 的所有文件都不是PDF类型.请注意这一点.
          • Note:

            • Unfortunately, I cannot understand about Drive.Files.insert() which is not working anymore. So if above modified script didn't work, please tell me. I would like to think of other methods.
            • When you check the log, if you cannot see the texts of Google Document converted from PDF, it means that all files of var file = ret.next(); are not PDF type. Please be careful this.
            • 如果我误解了你的问题,而这不是你想要的方向,我深表歉意.

              If I misunderstood your question and this was not the direction you want, I apologize.

              这篇关于Google App脚本:如何将PDF转换为GDOC以获得OCR?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆