如何调用 pypdfocr 函数以在 python 脚本中使用它们? [英] How to call pypdfocr functions to use them in a python script?

查看:60
本文介绍了如何调用 pypdfocr 函数以在 python 脚本中使用它们?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近我下载了 pypdfocr,但是,在文档中没有关于如何调用 pypdfocr 作为的示例一个库,有人可以帮我调用它只是为了转换单个文件吗?我刚刚找到了一个终端命令:

Recently I downloaded pypdfocr, however, in the documentation there are no examples of how to call pypdfocr as a library, could anybody help me to call it just to convert a single file?. I just found a terminal command:

$ pypdfocr filename.pdf

推荐答案

如果您正在寻找源代码,它通常位于您的 Python 安装目录 site-package 下.更重要的是,如果您使用的是 IDE(即 Pycharm),它将帮助您找到目录和文件.这对于查找类以及向您展示如何实例化它也非常有用,例如:https://github.com/virantha/pypdfocr/blob/master/pypdfocr/pypdfocr.py这个文件有一个 pypdfocr 类类型,你可以重用,并且可能做命令行会做的事情.

If you're looking for the source code, it's normally under the directory site-package of your python installation. What's more, if you're using a IDE (i.e. Pycharm), it would help you find the directory and file. This is extremly useful to find class as well and show you how you can instantiate it, for example : https://github.com/virantha/pypdfocr/blob/master/pypdfocr/pypdfocr.py this file has a pypdfocr class type you can re-use and, possibly, do what a command-line would do.

在那个类中,开发者已经把很多参数要解析:

In that class, the developper has put a lot of argument to be parsed :

def get_options(self, argv):
    """
        Parse the command-line options and set the following object properties:
        :param argv: usually just sys.argv[1:]
        :returns: Nothing
        :ivar debug: Enable logging debug statements
        :ivar verbose: Enable verbose logging
        :ivar enable_filing: Whether to enable post-OCR filing of PDFs
        :ivar pdf_filename: Filename for single conversion mode
        :ivar watch_dir: Directory to watch for files to convert
        :ivar config: Dict of the config file
        :ivar watch: Whether folder watching mode is turned on
        :ivar enable_evernote: Enable filing to evernote
    """
    p = argparse.ArgumentParser(description = "Convert scanned PDFs into their OCR equivalent.  Depends on GhostScript and Tesseract-OCR being installed.",
            epilog = "PyPDFOCR version %s (Copyright 2013 Virantha Ekanayake)" % __version__,
            )

    p.add_argument('-d', '--debug', action='store_true',
        default=False, dest='debug', help='Turn on debugging')

    p.add_argument('-v', '--verbose', action='store_true',
        default=False, dest='verbose', help='Turn on verbose mode')

    p.add_argument('-m', '--mail', action='store_true',
        default=False, dest='mail', help='Send email after conversion')

    p.add_argument('-l', '--lang',
        default='eng', dest='lang', help='Language(default eng)')


    p.add_argument('--preprocess', action='store_true',
            default=False, dest='preprocess', help='Enable preprocessing.  Not really useful now with improved Tesseract 3.04+')

    p.add_argument('--skip-preprocess', action='store_true',
            default=False, dest='skip_preprocess', help='DEPRECATED: always skips now.')

    #---------
    # Single or watch mode
    #--------
    single_or_watch_group = p.add_mutually_exclusive_group(required=True)
    # Positional argument for single file conversion
    single_or_watch_group.add_argument("pdf_filename", nargs="?", help="Scanned pdf file to OCR")
    # Watch directory for watch mode
    single_or_watch_group.add_argument('-w', '--watch', 
         dest='watch_dir', help='Watch given directory and run ocr automatically until terminated')

    #-----------
    # Filing options
    #----------
    filing_group = p.add_argument_group(title="Filing optinos")
    filing_group.add_argument('-f', '--file', action='store_true',
        default=False, dest='enable_filing', help='Enable filing of converted PDFs')
    #filing_group.add_argument('-c', '--config', type = argparse.FileType('r'),
    filing_group.add_argument('-c', '--config', type = lambda x: open_file_with_timeout(p,x),
         dest='configfile', help='Configuration file for defaults and PDF filing')
    filing_group.add_argument('-e', '--evernote', action='store_true',
        default=False, dest='enable_evernote', help='Enable filing to Evernote')
    filing_group.add_argument('-n', action='store_true',
        default=False, dest='match_using_filename', help='Use filename to match if contents did not match anything, before filing to default folder')


    # Add flow option to single mode extract_images,preprocess,ocr,write

    args = p.parse_args(argv)

您可以使用这些参数中的任何一个传递给它的解析器,如下所示:

You can use any of those argument to be passed to it's parser, like this :

import pypdfocr

obj = pypdfocr.pypdfocr.pypdfocr()
obj.get_options([]) # this makes it takes default, but you could add CLI option to it.  Other option might be [-v] or [-d,-v]

我希望这能帮助您同时理解:)

I hope this help you understand in the mean time :)

这篇关于如何调用 pypdfocr 函数以在 python 脚本中使用它们?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆