使用SWIFT进行PDF解析 [英] PDF Parsing with SWIFT
问题描述
我想解析一个没有图像,只有文本的PDF.我正在尝试查找文本片段.例如,搜索字符串名称:",并能够读取:"之后的字符.
I want to parse a PDF that has no images, only text. I'm trying to find pieces of text. For example to search the string "Name:" and be able to read the characters after ":".
我已经能够打开PDF,获取页数并在其上循环播放.问题是当我想使用像CGPDFDictionaryGetStream
或CGPDFStreamCopyData
这样的函数时,因为它们使用指针.我没有在互联网上找到许多迅速的程序员的信息.
I'm already able to open a PDF, get the number of pages, and to loop on them. The problem is when I want to use functions like CGPDFDictionaryGetStream
or CGPDFStreamCopyData
, because they use pointers. I have not found much info on the internet for swift programmers.
也许最简单的方法是将所有内容解析为NSString.然后我可以做剩下的事.
Maybe the easiest way would be to parse all the content to an NSString. Then I could do the rest.
这是我的代码:
// Get existing Pdf reference
let pdf = CGPDFDocumentCreateWithURL(NSURL(fileURLWithPath: path))
let pageCount = CGPDFDocumentGetNumberOfPages(pdf);
for index in 1...pageCount {
let myPage = CGPDFDocumentGetPage(pdf, index)
//Search somehow the string "Name:" to get whats written next
}
推荐答案
您可以使用 PDFKit 为此.它是Quartz
框架的一部分,可在iOS和MacOS上使用.它的速度也相当快,我仅用0.07秒就可以搜索包含15000多个字符的PDF.
You can use PDFKit to do this. It is part of the Quartz
framework and is available on both iOS and MacOS. It is also pretty fast, I was able to search through a PDF with over 15000 characters in just 0.07s.
这里是一个例子:
import Quartz
let pdf = PDFDocument(url: URL(fileURLWithPath: "/Users/...some path.../test.pdf"))
guard let contents = pdf?.string else {
print("could not get string from pdf: \(String(describing: pdf))")
exit(1)
}
let footNote = contents.components(separatedBy: "FOOT NOTE: ")[1] // get all the text after the first foot note
print(footNote.components(separatedBy: "\n")[0]) // print the first line of that text
// Output: "The operating system being written in C resulted in a more portable software."
您仍然可以访问以前拥有的大多数(如果不是全部)属性.例如pdf.pageCount
表示页面数,而pdf.page(at: <Int>)
表示获取特定页面.
You can also still access most of (if not all of) the properties you had before. Such as pdf.pageCount
for the number of pages, and pdf.page(at: <Int>)
to get a specific page.
这篇关于使用SWIFT进行PDF解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!