为什么下载的文件可能会损坏? [英] Why a downloaded file can get corrupted?
问题描述
我一直在尝试从以下URL下载pdf文件: http://pdfobject.com/markup/examples/full-browser-window.html
I have been trying to download a pdf file from the following URL: http://pdfobject.com/markup/examples/full-browser-window.html
Josh M 建议采用以下解决方案他的电脑.但是,我无法使其正常工作.我的意思是以下代码将文件保存到目标位置,但是,下载文件的重量仅为984字节(通常应为18 Kb).因此文件已损坏.我想不出为什么会发生这种情况的任何原因?
Josh M suggested the following solution that works on his computer. However, I cannot get it to work. I mean the following code saves the file to the destination, however, the downloaded file's weight is only 984 bytes (normally should be 18 Kb). So the file is corrupted. I cannot think of any reason of why this could happen?
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;
import java.nio.file.Files;
import java.nio.file.StandardOpenOption;
public final class FileDownloader {
private FileDownloader(){}
public static void main(String args[]) throws IOException{
download("http://pdfobject.com/markup/examples/full-browser-window.html", new File("C:\\Users\\Owner\\Desktop\\temporary\\myFile.pdf"));
download2("http://pdfobject.com/markup/examples/full-browser-window.html", new File("C:\\Users\\Owner\\Desktop\\temporary\\myFile2.pdf"));
}
public static void download(final String url, final File destination) throws IOException {
final URLConnection connection = new URL(url).openConnection();
connection.setConnectTimeout(60000);
connection.setReadTimeout(60000);
connection.addRequestProperty("User-Agent", "Mozilla/5.0");
final ByteArrayOutputStream baos = new ByteArrayOutputStream();
final byte[] buffer = new byte[2048];
int read;
final InputStream input = connection.getInputStream();
while((read = input.read(buffer)) > -1)
baos.write(buffer, 0, read);
baos.flush();
Files.write(destination.toPath(), baos.toByteArray(), StandardOpenOption.WRITE);
input.close();
}
public static void download2(final String url, final File destination) throws IOException {
final URLConnection connection = new URL(url).openConnection();
connection.setConnectTimeout(60000);
connection.setReadTimeout(60000);
connection.addRequestProperty("User-Agent", "Mozilla/5.0");
final FileOutputStream output = new FileOutputStream(destination, false);
final byte[] buffer = new byte[2048];
int read;
final InputStream input = connection.getInputStream();
while((read = input.read(buffer)) > -1)
output.write(buffer, 0, read);
output.flush();
output.close();
input.close();
}
}
推荐答案
您正在下载.html URL,该URL 包含一个作为嵌入式对象的参考PDF.与浏览器不同,Java不会对此进行处理,因此您要保存HTML,而不是PDF.看看里面.为了您的帮助,这里是:
You are downloading a .html URL which contains a referenced PDF as an embedded object. Java doesn't process that, unlike a browser, so you are saving the HTML, not the PDF. Have a look inside. For your assistance, here it is:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Embedding a PDF using static HTML markup: Full-browser window (100% width/height)</title>
<!-- This example created for PDFObject.com by Philip Hutchison (www.pipwerks.com) -->
<style type="text/css">
<!--
html {
height: 100%;
}
body {
margin: 0;
padding: 0;
height: 100%;
}
p {
padding: 1em;
}
object {
display: block;
}
-->
</style>
</head>
<body>
<object data="/pdf/sample.pdf#toolbar=1&navpanes=0&scrollbar=1&page=1&view=FitH"
type="application/pdf"
width="100%"
height="100%">
<p>It appears you don't have a PDF plugin for this browser. No biggie... you can <a href="/pdf/sample.pdf">click here to download the PDF file.</a></p>
</object>
</body>
</html>
这篇关于为什么下载的文件可能会损坏?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!