Jsoup,http错误416,解析HTML [英] Jsoup, http error 416, parsing HTML

查看:890
本文介绍了Jsoup,http错误416,解析HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对jsoup或HTML解析知之甚少。我试图从whitepages.com获取信息

  try {
Document doc = Jsoup.connect(http: 。//www.whitepages.com/phone/1 - ### - ### - ####)获得();
numberinfo = doc.select(。phone-list-data);
}

} catch(IOException e){
// TODO自动生成的catch块
e.printStackTrace();
}

我得到 org.jsoup.HttpStatusException:HTTP



我已经做了一些研究,它显示了一些关于范围的信息,它是否与范围有关?输入在特定的电话号码结束?



有没有办法让这样的jsoup解析信息?

解决方案

好的,当你要求更多的数据时,会发生 Http 416 这些请求在语法上是有效的,但不可满足。例如,如果您要请求 1K字节的文件,并且服务器上的实际文件小于请求的大小,则服务器将发出416错误,如果您请求较少的字节而不是服务器内容的实际大小,而不是您将通过http状态206(部分内容)收到响应。



为什么发生在您的情况中?
可能这是我猜不太确定, Jsoup 正在向请求中添加范围标题,请参阅 Jsoup.connect(url).maxBodySize ()在Jsoup中设置最大字节数,默认为1MB。在你的情况下,即使你改变它为 200字节也会出现同样的错误。

解决方案:在您的 Jsoup.connect(url)方法之后添加 ignoreHttpErrors(true)来忽略这样的错误,例如:

 尝试{
Document doc = Jsoup.connect(http://www.whitepages.com/phone/ 1 - ### - ### - ####)ignoreHttpErrors(真)获得();
元素元素= doc.select(。phone-list-data);
System.out.println(doc.html());
} catch(IOException e){
// TODO自动生成的catch块
e.printStackTrace();
}


I do not know much about jsoup or HTML parsing. I am trying to pull information from whitepages.com

try {
        Document doc = Jsoup.connect("http://www.whitepages.com/phone/1-###-###-####").get();
         numberinfo = doc.select(".phone-list-data");
     }

    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

I am getting org.jsoup.HttpStatusException: HTTP error fetching URL. with status 416.

I've done some research and it shows something about ranges, does it have to do with the input at the end for the specific phone number?

Is there a way to have jsoup parse info like this?

解决方案

Okay so Http 416 error occurs when you ask for more data than available, such request is syntactically valid but not satisfiable. For example if you are to request a file of 1K bytes and the actual file on server is less than the requested size, server will issue 416 error, If you request for less bytes than the actual size of server content than you will receive response with http status 206 (Partial Content).

Why is it occurring in your case? Probably and this is my guess not sure, Jsoup is adding range header to your request, see Jsoup.connect(url).maxBodySize() in Jsoup which sets max bytes to read and defaults to 1MB. In your case even if you change this to 200 bytes same error will occur.

Solution : After your Jsoup.connect(url) method add ignoreHttpErrors(true) to ignore such errors, e.g:

        try {
            Document doc = Jsoup.connect("http://www.whitepages.com/phone/1-###-###-####").ignoreHttpErrors(true).get();
            Elements elements = doc.select(".phone-list-data");
            System.out.println(doc.html());
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

这篇关于Jsoup,http错误416,解析HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆