java httpurlconnection切断html [英] java httpurlconnection cutting off html

查看:62
本文介绍了java httpurlconnection切断html的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嘿,我正在尝试从Twitter个人资料页面获取html,但是httpurlconnection只返回一小段html.我的代码

Hey, I'm trying to get the html from a twitter profile page, but httpurlconnection is only returning a small snippet of the html. My code

for(int i = 0; i < urls.size(); i++)
{
URL url = new URL(urls.get(i));
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestProperty("User-Agent","Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6");
System.out.println(connection.getResponseCode());
String line;
StringBuilder builder = new StringBuilder();
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
while((line = reader.readLine()) != null)
{
    builder.append(line);
}
String html = builder.toString();
}

每次呼叫我总是得到200作为响应代码.但是,大约有1/3的时间返回了整个html文档,而另一半只有前几百行.截断html时返回的金额并不总是相同.

I always get 200 as the response code for each call. However about 1/3 of the time the entire html document is returned, and the other half only the first few hundred lines. The amount returned when the html is cutoff is not always the same.

有什么想法吗?感谢您的帮助!

Any ideas? Thanks for any help!

其他信息:查看标题后,似乎我得到了重复的内容长度标题.第一个是完整长度,另一个是较短的长度(可能代表我正在使用的一些长度),该如何处理重复的标头?

Additional Info: After viewing the headers it seems I'm getting duplicate content-length headers. The first is the full length, the other is much shorter (and probably representative of the length I'm getting some of the time) How can I handle duplicate headers?

推荐答案

这对我来说很好,我在builder.append(line);之后添加了换行符,以使其在控制台中更具可读性,但除此之外,它返回了所有HTML格式.此页面:

This worked fine for me, I added a newline after builder.append(line); to make it more readable in the console, but other than that it returned all the HTML for this page:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;

public class RetrieveHTML {

    public static void main(String[] args) throws IOException {
        List<String> urls = new ArrayList<String>();
        urls.add("http://stackoverflow.com/questions/3285077/java-httpurlconnection-cutting-off-html");

        for (int i = 0; i < urls.size(); i++) {
            URL url = new URL(urls.get(i));
            HttpURLConnection connection = (HttpURLConnection) url.openConnection();
            connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6");
            System.out.println(connection.getResponseCode());
            String line;
            StringBuilder builder = new StringBuilder();
            BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
            while ((line = reader.readLine()) != null) {
                builder.append(line);
                builder.append("\n"); 
            }
            String html = builder.toString();
            System.out.println("HTML " + html);
        }

    }
}

这篇关于java httpurlconnection切断html的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆