C#:履带式工程 [英] c#: crawler project
本文介绍了C#:履带式工程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我能变得非常容易遵循以下code例子:
Could I get very easy to follow code examples on the following:
- 使用浏览器控件发起请求到目标网站。
- 从捕获目标网站的响应。
- 转换成响应DOM对象。
- 通过像名字,姓氏等DOM对象和捕捉迭代的事情,如果他们反应的一部分。
感谢
推荐答案
下面为code,它使用一个WebRequest对象检索数据并捕获响应作为流。
Here is code that uses a WebRequest object to retrieve data and captures the response as a stream.
public static Stream GetExternalData( string url, string postData, int timeout )
{
ServicePointManager.ServerCertificateValidationCallback += delegate( object sender,
X509Certificate certificate,
X509Chain chain,
SslPolicyErrors sslPolicyErrors )
{
// if we trust the callee implicitly, return true...otherwise, perform validation logic
return [bool];
};
WebRequest request = null;
HttpWebResponse response = null;
try
{
request = WebRequest.Create( url );
request.Timeout = timeout; // force a quick timeout
if( postData != null )
{
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = postData.Length;
using( StreamWriter requestStream = new StreamWriter( request.GetRequestStream(), System.Text.Encoding.ASCII ) )
{
requestStream.Write( postData );
requestStream.Close();
}
}
response = (HttpWebResponse)request.GetResponse();
}
catch( WebException ex )
{
Log.LogException( ex );
}
finally
{
request = null;
}
if( response == null || response.StatusCode != HttpStatusCode.OK )
{
if( response != null )
{
response.Close();
response = null;
}
return null;
}
return response.GetResponseStream();
}
有关管理的反应,我有我使用自定义的Xhtml分析器,但它是千code的行。有几个公开可用的解析器(见达林的评论)。
For managing the response, I have a custom Xhtml parser that I use, but it is thousands of lines of code. There are several publicly available parsers (see Darin's comment).
编辑:每在OP的问题,标头可以添加到模拟用户代理的请求。例如:
per the OP's question, headers can be added to the request to emulate a user agent. For example:
request = (HttpWebRequest)WebRequest.Create( url );
request.Accept = "application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, application/x-shockwave-flash, */*";
request.Timeout = timeout;
request.Headers.Add( "Cookie", cookies );
//
// manifest as a standard user agent
request.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US)";
这篇关于C#:履带式工程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文