当apache.httpclient能够获取内容时,JSoup.connect会抛出403错误 [英] JSoup.connect throws 403 error while apache.httpclient is able to fetch the content

查看:97
本文介绍了当apache.httpclient能够获取内容时,JSoup.connect会抛出403错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解析任何给定页面的HTML转储。我使用了 HTML Parser 并尝试了 JSoup 用于解析。

I am trying to parse HTML dump of any given page. I used HTML Parser and also tried JSoup for parsing.

我在Jsoup中找到了有用的函数但是在调用时遇到403错误文档doc = Jsoup.connect(url).get();

I found useful functions in Jsoup but I am getting 403 error while calling Document doc = Jsoup.connect(url).get();

我尝试了HTTPClient,获得了html转储,并且成功获得了相同的网址。

I tried HTTPClient, to get the html dump and it was successful for the same url.

为什么JSoup为同一个URL提供403,它提供来自公共http客户端的内容?
我做错了什么?有什么想法?

Why is JSoup giving 403 for the same URL which is giving content from commons http client? Am I doing something wrong? Any thoughts?

推荐答案

工作解决方案如下(感谢Angelo Neuschitzer提醒将其作为解决方案):

Working solution is as follows (Thanks to Angelo Neuschitzer for reminding to put it as a solution):

Document doc = Jsoup.connect(url).userAgent("Mozilla").get();
Elements links = doc.getElementsByTag(HTML.Tag.CITE.toString);
for (Element link : links) {
            String linkText = link.text();
            System.out.println(linkText);
}

所以, userAgent 可以解决问题:)

So, userAgent does the trick :)

这篇关于当apache.httpclient能够获取内容时,JSoup.connect会抛出403错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆