快速CGPDF文档解析 [英] swift CGPDFDocument parsing

查看:107
本文介绍了快速CGPDF文档解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在按照Apple的编程指南(其中所有示例均为ObjC ...)来尝试使用Swift解析PDF文档的内容。

I'm trying to use Swift to parse the contents of PDF documents, following Apple's programming guide (in which all the examples are ObjC...)

let filepath = "/Users/ben/Desktop/Test.pdf"
let localUrl  = filepath as CFString
if let pdfURL = CFURLCreateWithFileSystemPath(nil, localUrl, CFURLPathStyle.cfurlposixPathStyle, false) {
    if let pdf = CGPDFDocument(pdfURL) {
        if let inf = pdf.info {
            CGPDFDictionaryApplyFunction(inf, { (key, object, info) -> Void in
                print("\(key), \(object), \(info)")
            }, nil)
        }
        if let cat = pdf.catalog {

            CGPDFDictionaryApplyFunction(cat, { (key, object, info) -> Void in
                print("\(key), \(object), \(info)")
            }, nil)

        }
}
}

虽然这似乎会产生一些结果,但这只是十六进制数字的字符串。

While this seems to produce some results, it's just strings of hex digits.

0x00007ff29f43ce00, 0x00007ff29f492bd0, nil
0x00007ff29f443b60, 0x00007ff29f492cd0, nil
0x00007ff29f482590, 0x00007ff29f492dd0, nil
0x00007ff29f482a40, 0x00007ff29f492ed0, nil
0x00007ff29f482e30, 0x00007ff29f492fe0, nil
0x00007ff29f47da20, 0x00007ff29f4930e0, nil
0x00007ff29f474ac0, 0x00007ff29f842b50, nil
0x00007ff29f43f5d0, 0x00007ff29f842bf0, nil
0x00007ff29f485eb0, 0x00007ff29f842a60, nil
0x00007ff29f482f70, 0x00007ff29f842ab0, nil
0x00007ff29f48b1c0, 0x00007ff29f48f6d0, nil

那怎么办我如何获得实际数据?理想情况下,我试图获取文档元数据和诸如字体之类的东西。

So how do I get the actual data? Ideally, I'm trying to get at the document metadata and things like fonts contained.

推荐答案

您的解析将检索高级词典并info数据是正确的,但是您需要在CGPDFDictionaryApplyFunction中扩展解码以显示PDF数据的类型(整数,字符串,数组,字典等)的值。您正在调用的CGPDFDictionaryApplierFunction的语法是:

Your parsing retrieving high level dictionary and info data is correct, but you need to expand the decoding in CGPDFDictionaryApplyFunction to display the values of PDF data according their types (integer, string, array, dictionary, and so on). The syntax of the CGPDFDictionaryApplierFunction you are calling is:

typealias CGPDFDictionaryApplierFunction =(UnsafePointer< Int8>,COpaquePointer,UnsafeMutablePointer<()>)- >无效

您的程序正在显示指向数据的指针,您可以根据以下类型访问数据值(Swift 2):

Your program is displaying the pointers to the data, you could access the data values according their types as below (Swift 2):

    let filepath = "/Users/ben/Desktop/Test.pdf"
    let urlDocument = NSURL(fileURLWithPath: filepath)
    let myDocument = CGPDFDocumentCreateWithURL(urlDocument)
    if myDocument != nil {
        let numPages = CGPDFDocumentGetNumberOfPages(myDocument)
        print("Number of pages: \(numPages)")
        // Get complete catalog
        let myCatalog = CGPDFDocumentGetCatalog(myDocument)
        CGPDFDictionaryApplyFunction(myCatalog, printPDFKeys, nil)
        let myInfo = CGPDFDocumentGetInfo(myDocument)
        CGPDFDictionaryApplyFunction(myInfo, printPDFKeys, nil)
    } else {
        print("Cannot open PDF document")
    }

为了从CGPDFDictionaryApplyFunction调用,printPDFK eys被称为全局函数(在您的主类之外),或者您可以像上面的示例一样,将代码插入CGPDFDictionaryApplyFunction的闭包中。以下代码被缩短,并且不包括针对错误和空值的完整保护。

In order to be called from the CGPDFDictionaryApplyFunction, the printPDFKeys is to be called as a global function (outside your main class), alternately you could insert the code in a closure of CGPDFDictionaryApplyFunction as in your example above. The below code is shortened and is not including complete protection against errors and null values.

func printPDFKeys( key: UnsafePointer<Int8>, object: COpaquePointer, info: UnsafeMutablePointer<()>) {
    let contentDict: CGPDFDictionaryRef = CGPDFDictionaryRef(info)
    let keyString = String(CString: UnsafePointer<CChar>(key), encoding: NSISOLatin1StringEncoding)
    let objectType = CGPDFObjectGetType(object)
    if keyString == nil {
        return
    }
    print("key \(keyString!) is present in dictionary, type \(objectType.rawValue)")
    var ptrObjectValue = UnsafePointer<Int8>()
    switch objectType {
    // ObjectType is enum of:
    //   Null
    //   Boolean
    //   Integer
    //   Real
    //   Name
    //   String
    //   Array
    //   Dictionary
    //   Stream
    case .Boolean:
        // Boolean
        var objectBoolean = CGPDFBoolean()
        if CGPDFObjectGetValue(object, objectType, &objectBoolean) {
            let testbool = NSNumber(unsignedChar: objectBoolean)
            print("Boolean value \(testbool)")
        }
    case .Integer:
        // Integer
        var objectInteger = CGPDFInteger()
        if CGPDFObjectGetValue(object, objectType, &objectInteger) {
            print("Integer value \(objectInteger)")
        }
    case .Real:
        // Real
        var objectReal = CGPDFReal()
        if CGPDFObjectGetValue(object, objectType, &objectReal) {
            print("Real value \(objectReal)")
        }
    case .Name:
        // Name
        if (CGPDFObjectGetValue(object, objectType, &ptrObjectValue)) {
            let stringName = String(CString: UnsafePointer<CChar>(ptrObjectValue), encoding: NSISOLatin1StringEncoding)
            print("Name value: \(stringName!)")
        }
    case .String:
        // String
        let valueFound = CGPDFObjectGetValue(object, objectType, &ptrObjectValue)
        let stringValue = CGPDFStringCopyTextString(COpaquePointer(ptrObjectValue))
        print("String value: \(stringValue!)")
    case .Array:
        // Array
        print("Array")
        var objectArray = CGPDFArrayRef()
        if (CGPDFObjectGetValue(object, objectType, &objectArray))
        {
            print("array: \(arrayFromPDFArray(objectArray))")
        }
    case .Dictionary:
        // Dictionary
        var objectDictionary = CGPDFDictionaryRef()
        if (CGPDFObjectGetValue(object, objectType, &objectDictionary)) {
            let count = CGPDFDictionaryGetCount(objectDictionary)
            print("Found dictionary with \(count) entries")
            if !(keyString == "Parent") && !(keyString == "P") {
                //catalogLevel = catalogLevel + 1
                CGPDFDictionaryApplyFunction(objectDictionary, printPDFKeys, nil)
                //catalogLevel = catalogLevel - 1
            }
        }
case .Stream:
    // Stream
    print("Stream")
    var objectStream = CGPDFStreamRef()
    if (CGPDFObjectGetValue(object, objectType, &objectStream)) {
        let dict: CGPDFDictionaryRef = CGPDFStreamGetDictionary( objectStream )
        var fmt: CGPDFDataFormat = .Raw
        let streamData: CFDataRef = CGPDFStreamCopyData(objectStream, &fmt)!;
        let data = NSData(data: streamData)
        let dataString = NSString(data: data, encoding: NSUTF8StringEncoding)
        let dataLength: Int = CFDataGetLength(streamData)
        print("data stream (length=\(dataLength)):")
        if dataLength < 400 {
            print(dataString)
        }
    }
default:
    print("Null")
}
}

// convert a PDF array into an objC one
func arrayFromPDFArray(pdfArray: CGPDFArrayRef ) -> NSMutableArray {
var i:Int = 0
var tmpArray: NSMutableArray = NSMutableArray()

let count = CGPDFArrayGetCount(pdfArray)
for i in 0..<count {
    var value = CGPDFObjectRef()
    if (CGPDFArrayGetObject(pdfArray, i, &value)) {
        if let object = objectForPDFObject(value) {
            tmpArray.addObject(object)
        }
    }
}

return tmpArray
}

func objectForPDFObject( object: CGPDFObjectRef) -> AnyObject? {
let objectType: CGPDFObjectType = CGPDFObjectGetType(object)
var ptrObjectValue = UnsafePointer<Int8>()
switch (objectType) {
case .Boolean:
    // Boolean
    var objectBoolean = CGPDFBoolean()
    if CGPDFObjectGetValue(object, objectType, &objectBoolean) {
        let testbool = NSNumber(unsignedChar: objectBoolean)
        return testbool
    }
case .Integer:
    // Integer
    var objectInteger = CGPDFInteger()
    if CGPDFObjectGetValue(object, objectType, &objectInteger) {
        return objectInteger
    }
case .Real:
    // Real
    var objectReal = CGPDFReal()
    if CGPDFObjectGetValue(object, objectType, &objectReal) {
        return objectReal
    }
case .String:
    let valueFound = CGPDFObjectGetValue(object, objectType, &ptrObjectValue)
    let stringValue = CGPDFStringCopyTextString(COpaquePointer(ptrObjectValue))
    return stringValue
case .Dictionary:
    // Dictionary
    var objectDictionary = CGPDFDictionaryRef()
    if (CGPDFObjectGetValue(object, objectType, &objectDictionary)) {
        let count = CGPDFDictionaryGetCount(objectDictionary)
        print("In array, found dictionary with \(count) entries")
        CGPDFDictionaryApplyFunction(objectDictionary, printPDFKeys, nil)
    }
case .Stream:
    // Stream
    var objectStream = CGPDFStreamRef()
    if (CGPDFObjectGetValue(object, objectType, &objectStream)) {
        let dict: CGPDFDictionaryRef = CGPDFStreamGetDictionary( objectStream )
        var fmt: CGPDFDataFormat = .Raw
        let streamData: CFDataRef = CGPDFStreamCopyData(objectStream, &fmt)!;
        let data = NSData(data: streamData)
        let dataString = NSString(data: data, encoding: NSUTF8StringEncoding)
        print("data stream (length=\(CFDataGetLength(streamData))):")
        return dataString
    }
default:
    return nil
}
return nil
}

这篇关于快速CGPDF文档解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆