可以jsoup处理元刷新重定向 [英] can jsoup handle meta refresh redirect

查看:262
本文介绍了可以jsoup处理元刷新重定向的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个问题,使用jsoup,我想要做的是从网址获取一个文件,它将重定向到另一个网址,基于元刷新网址,这是无法正常工作,清楚地解释我是否输入了一个名为<的网站网址a href =http://www.amerisourcebergendrug.com =noreferrer> http://www.amerisourcebergendrug.com 将自动重定向到 http://www.amerisourcebergendrug.com/abcdrug/ 取决于元刷新网址,但我的jsoup仍然坚持使用 http://www.amerisourcebergendrug.com 而不是从 http://www.amerisourcebergendrug.com/abcdrug/

I have a problem using jsoup what I am trying to do is fetch a document from the url which will redirect to another url based on meta refresh url which is not working, to explain clearly if I am entering a website url named http://www.amerisourcebergendrug.com which will automatically redirect to http://www.amerisourcebergendrug.com/abcdrug/ depending upon the meta refresh url but my jsoup is still sticking with http://www.amerisourcebergendrug.com and not redirecting and fetching from http://www.amerisourcebergendrug.com/abcdrug/

Document doc = Jsoup.connect("http://www.amerisourcebergendrug.com").get();

我也尝试过使用,

Document doc = Jsoup.connect("http://www.amerisourcebergendrug.com").followRedirects(true).get();

但两者都不起作用

任何解决方法?

更新:
页面可能使用元刷新重定向方法

Update: The Page may use meta refresh redirect methods

推荐答案

更新(不区分大小写并且具有良好的容错能力)




  • 解析的内容(几乎)根据规范

  • 应首先成功解析内容元数据

  • Update (case insensitive and pretty fault tolerant)

    • The content parsed (almost) according to spec
    • The first successfully parsed content meta data should be used
    • public static void main(String[] args) throws Exception {
      
          URI uri = URI.create("http://www.amerisourcebergendrug.com");
      
          Document d = Jsoup.connect(uri.toString()).get();
      
          for (Element refresh : d.select("html head meta[http-equiv=refresh]")) {
      
              Matcher m = Pattern.compile("(?si)\\d+;\\s*url=(.+)|\\d+")
                                 .matcher(refresh.attr("content"));
      
              // find the first one that is valid
              if (m.matches()) {
                  if (m.group(1) != null)
                      d = Jsoup.connect(uri.resolve(m.group(1)).toString()).get();
                  break;
              }
          }
      }
      

      输出正确:

      http://www.amerisourcebergendrug.com/abcdrug/
      






      旧答案:



      你确定不行吗? 。对我来说:


      Old answer:

      Are you sure that it isn't working. For me:

      System.out.println(Jsoup.connect("http://www.ibm.com").get().baseUri());
      

      ..输出 http://www.ibm.com/us/ en / 正确..

      这篇关于可以jsoup处理元刷新重定向的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆