如何有效地将PDF转换为PNG? [英] How to convert PDF to PNG efficiently?

查看:111
本文介绍了如何有效地将PDF转换为PNG?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我具有以下功能,可以将PDF转换为一系列图像(每页一个图像):

I have the following function to convert a PDF into a series of images (one image per page):

import Quartz

func convertPDF(at sourceURL: URL, to destinationURL: URL, fileType: NSBitmapImageFileType, dpi: CGFloat = 200) throws -> [URL] {
    let fileExtension: String
    switch fileType {
    case .BMP:              fileExtension = "bmp"
    case .GIF:              fileExtension = "gif"
    case .JPEG, .JPEG2000:  fileExtension = "jpeg"
    case .PNG:              fileExtension = "png"
    case .TIFF:             fileExtension = "tiff"
    }

    let data = try Data(contentsOf: sourceURL)
    let pdfImageRep = NSPDFImageRep(data: data)!
    var imageURLs = [URL]()

    for i in 0..<pdfImageRep.pageCount {
        pdfImageRep.currentPage = i

        let width = pdfImageRep.size.width / 72 * dpi
        let height = pdfImageRep.size.height / 72 * dpi
        let image = NSImage(size: CGSize(width: width, height: height), flipped: false) { dstRect in
            pdfImageRep.draw(in: dstRect)
        }

        let bitmapImageRep = NSBitmapImageRep(data: image.tiffRepresentation!)!
        let bitmapData = bitmapImageRep.representation(using: fileType, properties: [:])!

        let imageURL = destinationURL.appendingPathComponent("\(sourceURL.deletingPathExtension().lastPathComponent)-Page\(i+1).\(fileExtension)")
        try bitmapData.write(to: imageURL, options: [.atomic])
        imageURLs.append(imageURL)
    }

    return imageURLs
}

这可以很好地工作,性能并没有飞快的速度,但这并不重要.我的问题与内存消耗有关.假设我要转换一个较长的PDF(Apple的10-Q,长达51页):

This works fine, performance is not blisteringly fast but that's not critical. My problem has to do with memory consumption. Let's say I'm converting a long PDF (Apple's 10-Q, 51-page long):

let sourceURL = URL(string: "http://files.shareholder.com/downloads/AAPL/4907179320x0x952191/4B5199AE-34E7-47D7-8502-CF30488B3B05/10-Q_Q3_2017_As-Filed_.pdf")!
let destinationURL = URL(fileURLWithPath: "/Users/mike/PDF")
let _ = try convertPDF(at: sourceURL, to: destinationURL, fileType: .PNG, dpi: 200)

到最后一页末尾,内存使用量一直增加到约11GB!

The memory usage keep increasing to ~11GB by the end of the last page!

一些我也注意到的事情:

A few things that I also notice:

  • 当我通过Instruments运行此程序时,它出人意料地显示没有泄漏.两个大内存猪是bitmapImageRepbitmapData.它们似乎没有在两次迭代之间发布.
  • 对其进行性能分析会降低性能,即使与Debug版本相比也是如此.
  • 减少DPI显然可以减少内存占用,但是行为保持不变.内存随着页数线性增加.
  • 无论是转换单个51页的PDF还是转换51个单页的PDF,都是一样的.
  • When I ran this through Instruments, it surprisingly showed no leaks. The two big memory hogs are bitmapImageRep and bitmapData. They don't appear to have been released between iterations.
  • Profiling it worsens performance, even when compared to the Debug build.
  • Reducing the DPI obviously reduces the memory footprint but the behaviour remains the same. Memory increases linearly with the number of pages.
  • It's the same whether I convert a single 51-page PDF or 51 single-page ones.

那么如何减少内存占用量?有没有更好的方法将PDF转换为图像?

So how can I reduce the memory footprint? Is there a better way to convert PDF to images?

推荐答案

经过一整天的努力,我最终回答了自己的问题.

After struggling with this for a whole day, I end up answering my own question.

解决方案是降低到Core Graphics和Image I/O框架中,以将每个PDF页面呈现到位图上下文中.由于每个页面都可以在其自己的线程上转换为位图,因此该问题非常适合并行化.

The solution is to drop lower, into Core Graphics and Image I/O frameworks, to render each PDF page into a bitmap context. This problem lends itself very well to paralellization since each page can be converted into a bitmap on its own thread.

struct ImageFileType {
    var uti: CFString
    var fileExtention: String

    // This list can include anything returned by CGImageDestinationCopyTypeIdentifiers()
    // I'm including only the popular formats here
    static let bmp = ImageFileType(uti: kUTTypeBMP, fileExtention: "bmp")
    static let gif = ImageFileType(uti: kUTTypeGIF, fileExtention: "gif")
    static let jpg = ImageFileType(uti: kUTTypeJPEG, fileExtention: "jpg")
    static let png = ImageFileType(uti: kUTTypePNG, fileExtention: "png")
    static let tiff = ImageFileType(uti: kUTTypeTIFF, fileExtention: "tiff")
}

func convertPDF(at sourceURL: URL, to destinationURL: URL, fileType: ImageFileType, dpi: CGFloat = 200) throws -> [URL] {
    let pdfDocument = CGPDFDocument(sourceURL as CFURL)!
    let colorSpace = CGColorSpaceCreateDeviceRGB()
    let bitmapInfo = CGImageAlphaInfo.noneSkipLast.rawValue

    var urls = [URL](repeating: URL(fileURLWithPath : "/"), count: pdfDocument.numberOfPages)
    DispatchQueue.concurrentPerform(iterations: pdfDocument.numberOfPages) { i in
        // Page number starts at 1, not 0
        let pdfPage = pdfDocument.page(at: i + 1)!

        let mediaBoxRect = pdfPage.getBoxRect(.mediaBox)
        let scale = dpi / 72.0
        let width = Int(mediaBoxRect.width * scale)
        let height = Int(mediaBoxRect.height * scale)

        let context = CGContext(data: nil, width: width, height: height, bitsPerComponent: 8, bytesPerRow: 0, space: colorSpace, bitmapInfo: bitmapInfo)!
        context.interpolationQuality = .high
        context.setFillColor(.white)
        context.fill(CGRect(x: 0, y: 0, width: width, height: height))
        context.scaleBy(x: scale, y: scale)
        context.drawPDFPage(pdfPage)

        let image = context.makeImage()!
        let imageName = sourceURL.deletingPathExtension().lastPathComponent
        let imageURL = destinationURL.appendingPathComponent("\(imageName)-Page\(i+1).\(fileType.fileExtention)")

        let imageDestination = CGImageDestinationCreateWithURL(imageURL as CFURL, fileType.uti, 1, nil)!
        CGImageDestinationAddImage(imageDestination, image, nil)
        CGImageDestinationFinalize(imageDestination)

        urls[i] = imageURL
    }
    return urls
}

用法:

let sourceURL = URL(string: "http://files.shareholder.com/downloads/AAPL/4907179320x0x952191/4B5199AE-34E7-47D7-8502-CF30488B3B05/10-Q_Q3_2017_As-Filed_.pdf")!
let destinationURL = URL(fileURLWithPath: "/Users/mike/PDF")
let urls = try convertPDF(at: sourceURL, to: destinationURL, fileType: .png, dpi: 200)

现在,转换速度非常快.内存使用率要低得多.显然,DPI越高,所需的CPU和内存就越多.不确定GPU的加速功能,因为我只有一个弱的Intel集成GPU.

Conversion is now blisteringly fast. Memory usage is a lot lower. Obviously the higher DPI you go the more CPU and memory it needs. Not sure about GPU acceleration as I only have a weak Intel integrated GPU.

这篇关于如何有效地将PDF转换为PNG?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆