C#:履带式工程 [英] c#: crawler project

查看:120
本文介绍了C#:履带式工程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我能变得非常容易遵循以下code例子:

Could I get very easy to follow code examples on the following:


  1. 使用浏览器控件发起请求到目标网站。

  2. 从捕获目标网站的响应。

  3. 转换成响应DOM对象。

  4. 通过像名字,姓氏等DOM对象和捕捉迭代的事情,如果他们反应的一部分。

感谢

推荐答案

下面为code,它使用一个WebRequest对象检索数据并捕获响应作为流。

Here is code that uses a WebRequest object to retrieve data and captures the response as a stream.

    public static Stream GetExternalData( string url, string postData, int timeout )
    {
        ServicePointManager.ServerCertificateValidationCallback += delegate( object sender,
                                                                                X509Certificate certificate,
                                                                                X509Chain chain,
                                                                                SslPolicyErrors sslPolicyErrors )
        {
            // if we trust the callee implicitly, return true...otherwise, perform validation logic
            return [bool];
        };

        WebRequest request = null;
        HttpWebResponse response = null;

        try
        {
            request = WebRequest.Create( url );
            request.Timeout = timeout; // force a quick timeout

            if( postData != null )
            {
                request.Method = "POST";
                request.ContentType = "application/x-www-form-urlencoded";
                request.ContentLength = postData.Length;

                using( StreamWriter requestStream = new StreamWriter( request.GetRequestStream(), System.Text.Encoding.ASCII ) )
                {
                    requestStream.Write( postData );
                    requestStream.Close();
                }
            }

            response = (HttpWebResponse)request.GetResponse();
        }
        catch( WebException ex )
        {
            Log.LogException( ex );
        }
        finally
        {
            request = null;
        }

        if( response == null || response.StatusCode != HttpStatusCode.OK )
        {
            if( response != null )
            {
                response.Close();
                response = null;
            }

            return null;
        }

        return response.GetResponseStream();
    }

有关管理的反应,我有我使用自定义的Xhtml分析器,但它是千code的行。有几个公开可用的解析器(见达林的评论)。

For managing the response, I have a custom Xhtml parser that I use, but it is thousands of lines of code. There are several publicly available parsers (see Darin's comment).

编辑:每在OP的问题,标头可以添加到模拟用户代理的请求。例如:

per the OP's question, headers can be added to the request to emulate a user agent. For example:

request = (HttpWebRequest)WebRequest.Create( url );
                request.Accept = "application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, application/x-shockwave-flash, */*";
                request.Timeout = timeout;
                request.Headers.Add( "Cookie", cookies );

                //
                // manifest as a standard user agent
                request.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US)";

这篇关于C#:履带式工程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆