给一个重定向的URL是一个带空格的URL,Jsoup会导致错误。怎么解决这个? [英] Giving an url, that redirected is a url with spaces, to Jsoup leads to an error. How resolve this?
问题描述
您好我必须解析URI由服务器重定向解析的页面。
Hello I have to parse pages wich URI is resolved by server redirect.
示例:
I有 http://www.juventus。重定向的com / wps / poc?uri = wcm:oid:91da6dbb-4089-49c0-a1df-3a56671b7020 http://www.juventus.com/wps/wcm/connect/JUVECOM-IT/news/primavera %20convocati%20villar%20news%2010agosto2013?pragma = no-cache
这是我必须解析的页面的URI。问题是重定向URI包含空格,这是代码。
This is URI of the page that I have to parse. The problem is that redirect URI contains spaces, here's the code.
String url = "http://www.juventus.com/wps/poc?uri=wcm:oid:91da6dbb-4089-49c0-a1df-3a56671b7020";
Document doc = Jsoup.connect(url).get();
Element img = doc.select(".juveShareImage").first();
String imgurl = img.absUrl("src");
System.out.println(imgurl);
我在第二行收到此错误:
I get this error at the second line:
Exception in thread "main" org.jsoup.HttpStatusException: HTTP error fetching URL. Status=404, URL=http://www.juventus.com/wps/wcm/connect/JUVECOM-IT/news/primavera convocati villar news 10agosto2013?pragma=no-cache
包含重定向的url,因此这意味着JSoup获取了正确的重定向URI。有没有办法用%20替换''所以我可以解析没问题?
that contains the redirected url, so this means that JSoup gets the correct redirected URI. Is there a way to replace the ' ' with %20 so I can parse with no problem?
谢谢!
推荐答案
你是对的。这就是问题。我看到的唯一解决方案是执行重定向手册。我写了这个小的递归方法为你做这个。请参阅:
You are right. This is the problem. The only solution I see is to do the redirects manual. I wrote this small recursive method doing this for you. See:
public static void main(String[] args) throws IOException
{
String url = "http://www.juventus.com/wps/poc?uri=wcm:oid:91da6dbb-4089-49c0-a1df-3a56671b7020";
Document document = manualRedirectHandler(url);
Elements elements = document.getElementsByClass("juveShareImage");
for (Element element : elements)
{
System.out.println(element.attr("src"));
}
}
private static Document manualRedirectHandler(String url) throws IOException
{
Response response = Jsoup.connect(url.replaceAll(" ", "%20")).followRedirects(false).execute();
int status = response.statusCode();
if (status == HttpURLConnection.HTTP_MOVED_TEMP || status == HttpURLConnection.HTTP_MOVED_PERM || status == HttpURLConnection.HTTP_SEE_OTHER)
{
String redirectUrl = response.header("location");
System.out.println("Redirect to: " + redirectUrl);
return manuelRedirectHandler(redirectUrl);
}
return Jsoup.parse(response.body());
}
这将打印出来
Redirect to: http://www.juventus.com:80/wps/portal/!ut/p/b0/DcdJDoAgEATAF00GXFC8-QqVWwMuJLLEGP2-1q3Y8Mwm4Qk77pATzv_L6-KQgx-09FDeWmpEr6nRThCk36hGq1QnbScqwRMbNuXCHsFLyuTgjpVLjOMHyfCBUg!!/
Redirect to: http://www.juventus.com/wps/wcm/connect/JUVECOM-IT/news/primavera convocati villar news 10agosto2013?pragma=no-cache
/resources/images/news/inlined/42d386ef-1443-488d-8f3e-583b1e5eef61.jpg
我还为Jsoup添加了一个补丁为此:
I also added a patch for Jsoup for that:
- https://github.com/jhy/jsoup/pull/354
这篇关于给一个重定向的URL是一个带空格的URL,Jsoup会导致错误。怎么解决这个?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!