无效的Cookie标头，然后要求授权 [英] Invalid Cookie Header and then it ask's for Authorization

查看：137 发布时间：2020/11/25 1:40:13 java httpclient web-crawler

本文介绍了无效的Cookie标头，然后要求授权的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试抓取需要Siteminder身份验证的页面，因此，我试图在代码本身中传递我的用户名和密码来访问该页面，并继续抓取该页面中存在的所有链接.这是我的Controller.java代码.从这个MyCrawler类开始被调用.

I am trying to crawl a page that requires Siteminder Authentication, So I am trying to pass my username and password in the code itself to access that page and keep on crawling all the links that are there in that page. This is my Controller.java code. And from this MyCrawler class is getting called.

public class Controller {
    public static void main(String[] args) throws Exception {

            CrawlController controller = new CrawlController("/data/crawl/root");

            controller.addSeed("http://ho.somehost.com/");

            controller.start(MyCrawler.class, 10);  
            controller.setPolitenessDelay(200);
            controller.setMaximumCrawlDepth(3);
    }
}

这是我的MyCrawler.java代码.在此，我将传递我的凭据(用户名和密码)以进行Siteminder身份验证.只是想确保在此MyCrawler代码或以上Controller代码中进行身份验证.而且此搜寻器代码来自此处(http://code.google.com/p/crawler4j/)

And this is my MyCrawler.java code. In this I am passing my credentials(username and password) for siteminder authentication. And just wanted to make sure that authentication should be done in this MyCrawler code or the above Controller code..??? And this crawler code is taken from here (http://code.google.com/p/crawler4j/)

public class MyCrawler extends WebCrawler {

    Pattern filters = Pattern.compile(".*(\\.(css|js|bmp|gif|jpe?g"
            + "|png|tiff?|mid|mp2|mp3|mp4" + "|wav|avi|mov|mpeg|ram|m4v|pdf"
            + "|rm|smil|wmv|swf|wma|zip|rar|gz))$");

    public MyCrawler() {


    }

    public boolean shouldVisit(WebURL url) {

        System.out.println("RJ:- " +url);

        DefaultHttpClient client = null;

        try
        {
            // Set url
            //URI uri = new URI(url.toString());

            client = new DefaultHttpClient();

            client.getCredentialsProvider().setCredentials(
                    new AuthScope(AuthScope.ANY_HOST, AuthScope.ANY_PORT, null),
                    new UsernamePasswordCredentials("test", "test"));

            // Set timeout
            //client.getParams().setParameter(CoreConnectionPNames.SO_TIMEOUT, 5000);
            HttpGet request = new HttpGet(url.toString());

            HttpResponse response = client.execute(request);
            if(response.getStatusLine().getStatusCode() == 200)
            {
                InputStream responseIS = response.getEntity().getContent();
                BufferedReader reader = new BufferedReader(new InputStreamReader(responseIS));
                String line = reader.readLine();
                while (line != null)
                {
                    System.out.println(line);
                    line = reader.readLine();
                }
            }
            else
            {
                System.out.println("Resource not available");
            }
        }
        catch (ClientProtocolException e)
        {
            System.out.println(e.getMessage());
        }
        catch (ConnectTimeoutException e)
        {
            System.out.println(e.getMessage());
        }
        catch (IOException e)
        {
            System.out.println(e.getMessage());
        }
        catch (Exception e)
        {
            System.out.println(e.getMessage());
        }
        finally
        {
            if ( client != null )
            {
                client.getConnectionManager().shutdown();
            }
        }


        String href = url.getURL().toLowerCase();
        if (filters.matcher(href).matches()) {
            return false;
        }
        if (href.startsWith("http://")) {
            return true;
        }
        return false;
    }

    public void visit(Page page) {
        int docid = page.getWebURL().getDocid();
        String url = page.getWebURL().getURL();         
        String text = page.getText();
        List<WebURL> links = page.getURLs();
        int parentDocid = page.getWebURL().getParentDocid();

        System.out.println("Docid: " + docid);
        System.out.println("URL: " + url);
        System.out.println("Text length: " + text.length());
        System.out.println("Number of links: " + links.size());
        System.out.println("Docid of parent page: " + parentDocid);
        System.out.println("=============");
    }   
}

我正在打印URL，以便可以看到正在打印的URL.因此，通过这种方式，它会打印两个网址，一个是需要身份验证的实际网址，然后是一些siteminder网址.当我运行该项目时，出现如下错误.

I am printing the url so that I can see what url's are getting printed. So by that way it prints two url one the actual url that requires authentication and then some siteminder url. And when I run this project I get error as following..

RJ:- http://ho.somehost.com/net/pa/ho.xhtml
 WARN [Crawler 1] Invalid cookie header: "Set-Cookie: SMCHALLENGE=; expires=Sat, 15 Jan 2011 02:52:54 GMT; path=/; domain=.somehost.com". Unable to parse expires attribute: Sat, 15 Jan 2011 02:52:54 GMT
 WARN [Crawler 1] Invalid cookie header: "Set-Co## Heading ##okie: SMIDENTITY=nzFSq2U3g/C3C6/jkj/Ocghyh/njK; expires=Sat, 13 Jul 2013 02:52:54 GMT; path=/; domain=.somehost.com". Unable to parse expires attribute: Sat, 13 Jul 2013 02:52:54 GMT
null
 INFO [Crawler 1] Number of pages fetched per second: 0
RJ:- https://lo.somehost.com/site/no/176/sm.exhtml
 WARN [Crawler 1] Invalid cookie header: "Set-Cookie: SMCHALLENGE=; expires=Sat, 15 Jan 2011 02:52:56 GMT; path=/; domain=.somehost.com". Unable to parse expires attribute: Sat, 15 Jan 2011 02:52:56 GMT
 WARN [Crawler 1] Invalid cookie header: "Set-Cookie: SMIDENTITY=IqsIPo; expires=Sat, 13 Jul 2013 02:52:56 GMT; path=/; domain=.somehost.com". Unable to parse expires attribute: Sat, 13 Jul 2013 02:52:56 GMT

任何建议将不胜感激.如果我将登录网址复制粘贴到浏览器中，则要求输入用户名和密码，如果我输入用户名和密码，则将获得实际的屏幕.

Any suggestions will be appreciated..And If I copy paste that login url into the browser, then it ask for username and password and If I type my username and password, then I get the actual screen.

无效的Cookie标头，然后要求授权 [英] Invalid Cookie Header and then it ask's for Authorization

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

无效的Cookie标头，然后要求授权 [英] Invalid Cookie Header and then it ask&#39;s for Authorization

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

无效的Cookie标头，然后要求授权 [英] Invalid Cookie Header and then it ask's for Authorization

登录关闭