如何在golang中从PDF中提取纯文本 [英] How to extract plain text from PDF in golang

查看：1759 发布时间：2018/5/2 18:49:18 pdf go text extract

本文介绍了如何在golang中从PDF中提取纯文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想使用GO从pdf文件中提取文本。
我尝试使用 ledongthuc / pdf 执行GetPlainText（）方法的包没有格式的文本内容。
但我没有得到纯文本。结果是：

I want to extract text from pdf file using GO. I tried using ledongthuc/pdf Go package that implement the method GetPlainText() to get plain text content without format. But I don't get the plain text. I have as a result:

W S D V Y R O R Q W D L U H P H Q W ......

Go代码

package main import ( "bytes" "fmt" "github.com/ledongthuc/pdf" ) func main() { content, err := readPdf("test.pdf") if err != nil { panic(err) } fmt.Println(content) return } func readPdf(path string) (string, error) { r, err := pdf.Open(path) if err != nil { return "", err } totalPage := r.NumPage() var textBuilder bytes.Buffer for pageIndex := 1; pageIndex <= totalPage; pageIndex++ { p := r.Page(pageIndex) if p.V.IsNull() { continue } textBuilder.WriteString(p.GetPlainText("\n")) } return textBuilder.String(), nil }

推荐答案

您可以收到诸如PDF文档的示例之类的消息。而不是

You can have a message such as "Exemple of a pdf document." instead of

Ex a m pl e of a pd f doc u m e nt .

您需要做的是更改 textBuilder.WriteString（p.GetPlainText （\ n））
to

What you need to do is change the textBuilder.WriteString(p.GetPlainText("\n")) to

textBuilder.WriteString（p.GetPlainText（））

我希望这有助于。

这篇关于如何在golang中从PDF中提取纯文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在golang中从PDF中提取纯文本 [英] How to extract plain text from PDF in golang

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

如何在golang中从PDF中提取纯文本 [英] How to extract plain text from PDF in golang

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭