PDFBox IOException:文件结尾,预期行 [英] PDFBox IOException: End of File, expected line
本文介绍了PDFBox IOException:文件结尾,预期行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我目前正在尝试使用PDFBox和Selenium从已经通过链接上传和访问的PDF中抓取文本. 我将其用作来源: http://www.seleniumeasy.com/selenium-tutorials/how-to-extract-pdf-text-and-verify-using-selenium-webdriver-java
I am currently trying to grab text from a PDF that is already uploaded and accessed through a link by using PDFBox and Selenium. I used this as a source: http://www.seleniumeasy.com/selenium-tutorials/how-to-extract-pdf-text-and-verify-using-selenium-webdriver-java
public String function(String pdf_url) {
PDFTextStripper pdfStripper = null;
PDDocument pDoc;
COSDocument cDoc;
String parsedText = "";
try {
URL url = new URL(pdf_url);
BufferedInputStream file = new BufferedInputStream(url.openStream());
PDFParser parser = new PDFParser(file);
parser.parse();
cDoc = parser.getDocument();
pdfStripper = new PDFTextStripper();
pdfStripper.setStartPage(1);
pdfStripper.setEndPage(1);
pDoc = new PDDocument(cDoc);
parsedText = pdfStripper.getText(pDoc);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return parsedText;
}
Error: End-of-File expected line
at org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1519)
at org.apache.pdfbox.pdfparser.PDFParser.parseHeader(PDFParser.java:372)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:186)
at scripts.Script.grabPDF_Text(Script.java:94)
at scripts.Script.main(Script.java:817)
为什么会出现此错误?
推荐答案
以下是您要求使用PDFURL共享的示例
Here is the example that you asked to share using PDFURL
string PDFURL = "https://www.adobe.com/support/products/enterprise/knowledgecenter/media/c4611_sample_explain.pdf";
function(PDFURL1);
public String function(String pdf_url)
{
//Exact same code as yours
}
对于将PDF用作本地文件,URL和BufferedInputStream需要替换为
For using PDF as local file, URL and BufferedInputStream needs to be replaced by
File file = new File(pdf_url);
PDFParser parser = new PDFParser(new FileInputStream(file));
希望这会有所帮助
这篇关于PDFBox IOException:文件结尾,预期行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文