无法提取PDF文件作为二进制数据 [英] Cannot fetch PDF file as binary data

查看:147
本文介绍了无法提取PDF文件作为二进制数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图提取从PDF文件:

I'm trying to fetch a PDF file from:

网址: https://开头DOMAIN_NAME / XYZ / _id /下载/

其中,它没有指向一个直接的PDF文件,每一个独特的文件被下载 除preting特定< _id>字段

wherein it doesn't points to a direct pdf file and each unique file gets downloaded interpreting a particular <_id> field.

我把这个链接在浏览器和PDF文件的地址栏被立即下载的, 而当我试图通过HTTPsURLConnection它的内容类型是text / html的'形式来获取它, 而应该在应用程序/ PDF格式。

I put this link in the address bar of the browser and Pdf file gets downloaded instantly, while when I try to fetch it by HTTPsURLConnection its Content-Type is in 'text/html' form, while it should be in 'application/pdf'.

我也试着'调用setRequestProperty到应用程序/ PDF连接,但是文件始终得到text / html的表格下载前。

I also tried to 'setRequestProperty' to 'application/pdf' before connecting but file always get downloaded in 'text/html' form.

方法我用的是GET

1)我需要使用HttpClient的,而不是HttpsURLConnection?

1) Do I need to use HttpClient instead of HttpsURLConnection?

2),这些类型的链接来提高安全性?

2) Are these type of links used to increase security?

3)请指出我的错误了。

3) Please point my mistakes out.

4)我怎样才能知道服务器上的文件名present?

4) How can I know the filename present on the server?

我粘贴下面主要codeS,我已经实现了:

I'm pasting below main codes that I've implemented:

    URL url = new URL(sb.toString());

    //created new connection
    HttpsURLConnection urlConnection = (HttpsURLConnection) url.openConnection();

    //have set the request method and property
    urlConnection.setRequestMethod("GET");
    urlConnection.setDoOutput(true);
    urlConnection.setRequestProperty("Content-Type", "application/pdf");

    Log.e("Content Type--->", urlConnection.getContentType()+"   "+ urlConnection.getResponseCode()+"  "+ urlConnection.getResponseMessage()+"              "+urlConnection.getHeaderField("Content-Type"));

    //and connecting!
    urlConnection.connect();

    //setting the path where we want to save the file
    //in this case, going to save it on the root directory of the
    //sd card.
    File SDCardRoot = Environment.getExternalStorageDirectory();

    //created a new file, specifying the path, and the filename

    File file = new File(SDCardRoot,"example.pdf");

    if((Environment.getExternalStorageState()).equals(Environment.MEDIA_MOUNTED_READ_ONLY))

    //writing the downloaded data into the file we created
    FileOutputStream fileOutput = new FileOutputStream(file);

    //this will be used in reading the data from the internet
    InputStream inputStream = urlConnection.getInputStream();

    //this is the total size of the file
    int totalSize = urlConnection.getContentLength();

    //variable to store total downloaded bytes
    Log.e("Total File Size ---->", ""+totalSize);
    int downloadedSize = 0;

    //create a buffer...
    byte[] buffer = new byte[1024];
    int bufferLength = 0; //used to store a temporary size of the buffer

    //Reading through the input buffer and write the contents to the file
    while ( (bufferLength = inputStream.read(buffer)) > 0 ) {

        //add the data in the buffer to the file in the file output stream (the file on the sd card
        fileOutput.write(buffer, 0, bufferLength);


        //adding up the size
        downloadedSize += bufferLength;

        //reporting the progress:
        Log.e("This much downloaded---->",""+ downloadedSize);

    }
    //closed the output stream
    fileOutput.close();

我寻觅了很多,无法得到的结果。 如果可能的话请尽量详细说明我的错误,因为我实现这个 首次事

I have searched a lot and couldn't get the result. If possible please try to elaborate my mistake as I'm implementing this thing for the first time.

*的尝试获取像直接PDF链接:<一href="http://labs.google.com/papers/bigtable-osdi06.pdf">http://labs.google.com/papers/bigtable-osdi06.pdf   而他们得到轻松地下载了,而且他们的Content-Type的也是应用程序/ PDF*

*Tried fetching direct pdf links like: http://labs.google.com/papers/bigtable-osdi06.pdf and they get downloaded easily, moreover their 'Content-Type' was also 'application/pdf' *

感谢。

推荐答案

理论1:服务器响应响应不正确的内容类型。如果服务器code被写入和部署你检查。

Theory 1 : The server is responding with incorrect Content type in response. If server code is written and deployed by you check that.

理论2:URL是返回一个HTML网页,其中有一些JavaScript它会被重定向页面,以实际PDF文件的URL

Theory 2 : The url is returning an html page which has some javascript in it which redirects page to the url of the actual pdf file.

这篇关于无法提取PDF文件作为二进制数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆