Jsoup,http错误416,解析HTML [英] Jsoup, http error 416, parsing HTML
问题描述
我对jsoup或HTML解析知之甚少。我试图从whitepages.com获取信息
try {
Document doc = Jsoup.connect(http: 。//www.whitepages.com/phone/1 - ### - ### - ####)获得();
numberinfo = doc.select(。phone-list-data);
}
} catch(IOException e){
// TODO自动生成的catch块
e.printStackTrace();
}
我得到 org.jsoup.HttpStatusException:HTTP
我已经做了一些研究,它显示了一些关于范围的信息,它是否与范围有关?输入在特定的电话号码结束?
有没有办法让这样的jsoup解析信息?
好的,当你要求更多的数据时,会发生 Http 416
这些请求在语法上是有效的,但不可满足。例如,如果您要请求 1K字节
的文件,并且服务器上的实际文件小于请求的大小,则服务器将发出416错误,如果您请求较少的字节而不是服务器内容的实际大小,而不是您将通过http状态206(部分内容)收到响应。
为什么发生在您的情况中?
可能这是我猜不太确定, Jsoup
正在向请求中添加范围标题,请参阅 Jsoup.connect(url).maxBodySize ()
在Jsoup中设置最大字节数,默认为1MB。在你的情况下,即使你改变它为 200字节
也会出现同样的错误。
解决方案:在您的 Jsoup.connect(url)
方法之后添加 ignoreHttpErrors(true)
来忽略这样的错误,例如:
尝试{
Document doc = Jsoup.connect(http://www.whitepages.com/phone/ 1 - ### - ### - ####)ignoreHttpErrors(真)获得();
元素元素= doc.select(。phone-list-data);
System.out.println(doc.html());
} catch(IOException e){
// TODO自动生成的catch块
e.printStackTrace();
}
I do not know much about jsoup or HTML parsing. I am trying to pull information from whitepages.com
try {
Document doc = Jsoup.connect("http://www.whitepages.com/phone/1-###-###-####").get();
numberinfo = doc.select(".phone-list-data");
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
I am getting org.jsoup.HttpStatusException: HTTP error fetching URL.
with status 416.
I've done some research and it shows something about ranges, does it have to do with the input at the end for the specific phone number?
Is there a way to have jsoup parse info like this?
Okay so Http 416
error occurs when you ask for more data than available, such request is syntactically valid but not satisfiable. For example if you are to request a file of 1K bytes
and the actual file on server is less than the requested size, server will issue 416 error, If you request for less bytes than the actual size of server content than you will receive response with http status 206 (Partial Content).
Why is it occurring in your case?
Probably and this is my guess not sure, Jsoup
is adding range header to your request, see Jsoup.connect(url).maxBodySize()
in Jsoup which sets max bytes to read and defaults to 1MB. In your case even if you change this to 200 bytes
same error will occur.
Solution : After your Jsoup.connect(url)
method add ignoreHttpErrors(true)
to ignore such errors, e.g:
try {
Document doc = Jsoup.connect("http://www.whitepages.com/phone/1-###-###-####").ignoreHttpErrors(true).get();
Elements elements = doc.select(".phone-list-data");
System.out.println(doc.html());
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
这篇关于Jsoup,http错误416,解析HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!