使用jsoup.connect(url).get()关注javascript重定向? [英] Using jsoup.connect(url).get() to follow javascript redirects?

查看:309
本文介绍了使用jsoup.connect(url).get()关注javascript重定向?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我原本有这个问题:



基本上,当我在一个网站上搜索一个带有两个词的项目时,例如夏天的衣服,我会被重定向到一个只有夏天的搜索。从这个答案怀疑,这是因为西尔斯使用JavaScript重定向和Jsoup不支持JavaScript重定向,所以我想知道是否有任何方式来获取该网站,同时仍然使用Jsoup。

解决方案

下面的代码同时检查meta属性REFRESH和javascript重定向...如果其中任何一个存在 RedirectedUrl 变量已设置。所以你知道你的目标......

  String RedirectedUrl = null; 
元素meta = page.select(html head meta);
if(meta.attr(http-equiv)。contains(REFRESH)){
RedirectedUrl = meta.attr(content)。split(=)[1];
} else {
if(page.toString()。contains(window.location.href)){
meta = page.select(script); (元素脚本:元){
String s = script.data();

if(!s.isEmpty()&& s.startsWith(window.location.href)){
int start = s.indexOf(=);
int end = s.indexOf(;);
if(start> 0&& end> start){
s = s.substring(start + 1,end);
s = s.replace(',).replace(\,);
RedirectedUrl = s.trim();
break;
}
}
}
}
}

...现在再次检索重定向的页面...


I originally had this question:

Having trouble fetching the proper site in Java (second word for website search query gets cut off)

Basically, when I searched a website for an item with two words, for example "summer clothes" I would be redirected to a search with just "summer". From that answer suspect that it's because Sears uses javascript to redirect and Jsoup does not support javascript redirecting, so I was wondering if there is any way to fetch that website while still using Jsoup.

解决方案

The code below checks both for meta attribute "REFRESH" and javascript redirects... If either of them exists RedirectedUrl variable is set. So you know your target...

    String RedirectedUrl=null;
    Elements meta = page.select("html head meta");
    if (meta.attr("http-equiv").contains("REFRESH")) {
        RedirectedUrl = meta.attr("content").split("=")[1];
    } else {
        if (page.toString().contains("window.location.href")) {
            meta = page.select("script");
            for (Element script:meta) {
                String s = script.data();
                if (!s.isEmpty() && s.startsWith("window.location.href")) {
                    int start = s.indexOf("=");
                    int end = s.indexOf(";");
                    if (start>0 && end >start) {
                        s = s.substring(start+1,end);
                        s =s.replace("'", "").replace("\"", "");        
                        RedirectedUrl = s.trim();
                        break;
                    }
                }
            }
        }
    }

... now retrieve the redirected page again...

这篇关于使用jsoup.connect(url).get()关注javascript重定向?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆