Jsoup connect():绕过谷歌验证码 [英] Jsoup connect(): bypass google captcha

查看:1005
本文介绍了Jsoup connect():绕过谷歌验证码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我制作了一个小应用程序,我必须根据关键字检索URL。
这是代码:

I make a small application and I have to retrieve the URL based on keywords. This is the code:

  Elements doc = Jsoup
        .connect(request)
        .userAgent(
          "Mozilla 5.0 (Windows NT 6.1)")
        .timeout(5000).get().select("li.g>h3>a");


        for (Element link : doc) {

              String url = link.absUrl("href"); 
            try {
              url = URLDecoder.decode(url.substring(url.indexOf('=') + 1, url.indexOf('&')), "UTF-8");
            } catch (UnsupportedEncodingException e) {
                    // TODO Auto-generated catch block
              e.printStackTrace();
            }



            if(!url.startsWith("http")) 
                continue; // Ads/news/etc.
            else if(url.contains("/pdf/"))
                continue;
            else if(url.contains("//github.com/"))
                continue;


            res.add(url);
        }

收到以下错误:

org.jsoup.HttpStatusException: HTTP error fetching URL. Status=503, URL=http://ipv4.google.com/sorry/IndexRedirect?continue=http://www.google.com/search%3Flr%3Dlang_en....
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:435)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:446)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:410)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:164)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:153)
at sperimentazioni.Main.getDataFromGoogle(Main.java:327)
at sperimentazioni.Main.getURLs(Main.java:164)
at sperimentazioni.Main.main(Main.java:485)

显然它是验证码谷歌,我该如何绕过?

Apparently it is the captcha google, how can I bypass?

推荐答案

以下逻辑对我有用:

Document doc =
    Jsoup.connect(request)
         .userAgent("Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")
         .timeout(5000).get();

Elements links = doc.select("a[href]");
for (Element link : links) {

    String temp = link.attr("href");
    if (temp.startsWith("/url?q=")) 
        result.add(temp);

}

这篇关于Jsoup connect():绕过谷歌验证码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆