无效的Cookie标头,然后要求授权 [英] Invalid Cookie Header and then it ask's for Authorization

查看:137
本文介绍了无效的Cookie标头,然后要求授权的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试抓取需要Siteminder身份验证的页面,因此,我试图在代码本身中传递我的用户名和密码来访问该页面,并继续抓取该页面中存在的所有链接.这是我的Controller.java代码.从这个MyCrawler类开始被调用.

I am trying to crawl a page that requires Siteminder Authentication, So I am trying to pass my username and password in the code itself to access that page and keep on crawling all the links that are there in that page. This is my Controller.java code. And from this MyCrawler class is getting called.

public class Controller {
    public static void main(String[] args) throws Exception {

            CrawlController controller = new CrawlController("/data/crawl/root");

            controller.addSeed("http://ho.somehost.com/");

            controller.start(MyCrawler.class, 10);  
            controller.setPolitenessDelay(200);
            controller.setMaximumCrawlDepth(3);
    }
}

这是我的MyCrawler.java代码.在此,我将传递我的凭据(用户名和密码)以进行Siteminder身份验证.只是想确保在此MyCrawler代码或以上Controller代码中进行身份验证.而且此搜寻器代码来自此处(http://code.google.com/p/crawler4j/)

And this is my MyCrawler.java code. In this I am passing my credentials(username and password) for siteminder authentication. And just wanted to make sure that authentication should be done in this MyCrawler code or the above Controller code..??? And this crawler code is taken from here (http://code.google.com/p/crawler4j/)

public class MyCrawler extends WebCrawler {

    Pattern filters = Pattern.compile(".*(\\.(css|js|bmp|gif|jpe?g"
            + "|png|tiff?|mid|mp2|mp3|mp4" + "|wav|avi|mov|mpeg|ram|m4v|pdf"
            + "|rm|smil|wmv|swf|wma|zip|rar|gz))$");

    public MyCrawler() {


    }

    public boolean shouldVisit(WebURL url) {

        System.out.println("RJ:- " +url);

        DefaultHttpClient client = null;

        try
        {
            // Set url
            //URI uri = new URI(url.toString());

            client = new DefaultHttpClient();

            client.getCredentialsProvider().setCredentials(
                    new AuthScope(AuthScope.ANY_HOST, AuthScope.ANY_PORT, null),
                    new UsernamePasswordCredentials("test", "test"));

            // Set timeout
            //client.getParams().setParameter(CoreConnectionPNames.SO_TIMEOUT, 5000);
            HttpGet request = new HttpGet(url.toString());

            HttpResponse response = client.execute(request);
            if(response.getStatusLine().getStatusCode() == 200)
            {
                InputStream responseIS = response.getEntity().getContent();
                BufferedReader reader = new BufferedReader(new InputStreamReader(responseIS));
                String line = reader.readLine();
                while (line != null)
                {
                    System.out.println(line);
                    line = reader.readLine();
                }
            }
            else
            {
                System.out.println("Resource not available");
            }
        }
        catch (ClientProtocolException e)
        {
            System.out.println(e.getMessage());
        }
        catch (ConnectTimeoutException e)
        {
            System.out.println(e.getMessage());
        }
        catch (IOException e)
        {
            System.out.println(e.getMessage());
        }
        catch (Exception e)
        {
            System.out.println(e.getMessage());
        }
        finally
        {
            if ( client != null )
            {
                client.getConnectionManager().shutdown();
            }
        }


        String href = url.getURL().toLowerCase();
        if (filters.matcher(href).matches()) {
            return false;
        }
        if (href.startsWith("http://")) {
            return true;
        }
        return false;
    }

    public void visit(Page page) {
        int docid = page.getWebURL().getDocid();
        String url = page.getWebURL().getURL();         
        String text = page.getText();
        List<WebURL> links = page.getURLs();
        int parentDocid = page.getWebURL().getParentDocid();

        System.out.println("Docid: " + docid);
        System.out.println("URL: " + url);
        System.out.println("Text length: " + text.length());
        System.out.println("Number of links: " + links.size());
        System.out.println("Docid of parent page: " + parentDocid);
        System.out.println("=============");
    }   
}

我正在打印URL,以便可以看到正在打印的URL.因此,通过这种方式,它会打印两个网址,一个是需要身份验证的实际网址,然后是一些siteminder网址.当我运行该项目时,出现如下错误.

I am printing the url so that I can see what url's are getting printed. So by that way it prints two url one the actual url that requires authentication and then some siteminder url. And when I run this project I get error as following..

RJ:- http://ho.somehost.com/net/pa/ho.xhtml
 WARN [Crawler 1] Invalid cookie header: "Set-Cookie: SMCHALLENGE=; expires=Sat, 15 Jan 2011 02:52:54 GMT; path=/; domain=.somehost.com". Unable to parse expires attribute: Sat, 15 Jan 2011 02:52:54 GMT
 WARN [Crawler 1] Invalid cookie header: "Set-Co## Heading ##okie: SMIDENTITY=nzFSq2U3g/C3C6/jkj/Ocghyh/njK; expires=Sat, 13 Jul 2013 02:52:54 GMT; path=/; domain=.somehost.com". Unable to parse expires attribute: Sat, 13 Jul 2013 02:52:54 GMT
null
 INFO [Crawler 1] Number of pages fetched per second: 0
RJ:- https://lo.somehost.com/site/no/176/sm.exhtml
 WARN [Crawler 1] Invalid cookie header: "Set-Cookie: SMCHALLENGE=; expires=Sat, 15 Jan 2011 02:52:56 GMT; path=/; domain=.somehost.com". Unable to parse expires attribute: Sat, 15 Jan 2011 02:52:56 GMT
 WARN [Crawler 1] Invalid cookie header: "Set-Cookie: SMIDENTITY=IqsIPo; expires=Sat, 13 Jul 2013 02:52:56 GMT; path=/; domain=.somehost.com". Unable to parse expires attribute: Sat, 13 Jul 2013 02:52:56 GMT

任何建议将不胜感激.如果我将登录网址复制粘贴到浏览器中,则要求输入用户名和密码,如果我输入用户名和密码,则将获得实际的屏幕.

Any suggestions will be appreciated..And If I copy paste that login url into the browser, then it ask for username and password and If I type my username and password, then I get the actual screen.

推荐答案

为以后人提取聊天讨论的主要内容,以防万一有人遇到相同的问题.

Extracting the salient contents of the chat discussion for posterity, in case anyone experiences the same issue.

显示的警告消息表明HttpClient无法解析SiteMinder发出的Set-Cookie标头.使用Wireshark对网络流量进行的分析显示:

The warning message displayed, indicated that HttpClient was unable to parse the Set-Cookie header issued by SiteMinder. Analysis of the network traffic using Wireshark revealed the following:

  • 未为SiteMinder发行的Cookie SMSESSION设置任何过期属性.这不是问题的原因.只是注意,需要查看负责警告的服务器的HTTP响应.
  • 针对Cookie SMCHALLENGESMIDENTITY发出了警告.因此,需要检查包含这两个cookie的Set-Cookie标头的响应.
  • 问题可能出在:
    • Cookie本身会对其赋值,或者
    • cookies expires属性中日期的格式.
    • No expires attribute was set for the cookie SMSESSION, which was issued by SiteMinder. This is not the cause of the problem; it is just a note that the HTTP response from the server responsible for the warning needs to be looked.
    • The warnings were issued for the cookies SMCHALLENGE and SMIDENTITY. Therefore, the responses containing the Set-Cookie headers for these two cookies need to examined.
    • The problem could be in:
      • the cookie values themselves, or
      • the format of the dates in the expires attribute of the cookies.

      如果以上结果(在cookie中使用4位数字的过期值)被证明是不正确的根本原因,则必须指定用于解析cookie值的日期格式.可以通过使用HttpClient以以下方式指定允许/接受的日期格式列表来完成此操作:

      If the above (use of 4 digit years in the cookie expires value) turns out to be an incorrect root cause, then one must specify the date format used to parse the cookie value. This can be done by specifying the list of allowed/accepted date formats by using HttpClient in the following manner:

      HttpGet request = new HttpGet(url.toString());
      request.getParams().setParameter(CookieSpecPNames.DATE_PATTERNS, Arrays.asList("EEE, d MMM yyyy HH:mm:ss z"));
      HttpResponse response = client.execute(request);
      

      代替现有呼叫:

      HttpGet request = new HttpGet(url.toString());
      
      HttpResponse response = client.execute(request);
      

      指定的模式EEE, d MMM yyyy HH:mm:ss z是有效的模式,用于似乎被错误地解析的日期(通过控制台中的消息进行).如果还有其他日期格式,HttpClient无法正确处理,则需要添加其他模式.有关使用的格式的详细信息,请参见 SimpleDateFormat 类文档.

      The pattern specified EEE, d MMM yyyy HH:mm:ss z is a valid pattern for the dates that appear to be parsed incorrectly (going by the messages in the console). You will need to add additional patterns if there are other date formats that are not handled correctly by HttpClient. For details on the format used, see the SimpleDateFormat class documentation.

      这篇关于无效的Cookie标头,然后要求授权的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆