如何从C#中WebRequest类的响应中获取纯文本 [英] How to get plaintext from the response of a WebRequest class in C#

查看:385
本文介绍了如何从C#中WebRequest类的响应中获取纯文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用WebRequest类获取纯文本,就像我们在使用 webbrowser1.Document.Body.InnerText 时所得到的一样。我已经尝试了下面的代码:
$ b $ pre $ public string request_Resource()
{
HttpWebRequest request =(HttpWebRequest) WebRequest.Create(myurl);
Stream stream = request.GetResponse()。GetResponseStream();
StreamReader sr = new StreamReader(stream);
WebBrowser wb = new WebBrowser();
wb.DocumentText = sr.ReadToEnd();
返回wb.Document.Body.InnerText;
}

当我执行这个时,得到一个 NullReferenceException



有没有更好的方法来获取纯文本。



注意:I不能直接使用webbrowser控件来加载网页,因为我不想处理所有那些在加载页面时多次触发的事件。



更新:我已经改变了我的代码,使用WebClient类而不是WebRequest,建议
我的代码现在看起来像这样


<$ p $公共字符串request_Resource()
{
WebClient wc = new WebClient();
wc.Proxy = null;
//添加用户代理头以避免任何可能的错误
wc.Headers.Add(user-agent,Mozilla / 5.0(Windows; U; Windows NT 5.1; en-US; rv:1.9.2.10)Gecko / 20100914 Firefox / 3.6.10(.NET CLR 3.5.30729; .NET4.0C));
返回wc.DownloadString(myurl);
}

我正在考虑使用HTML实用程序包,任何人都可以提出更好的建议 HTML Agility Pack ,它可以在没有IE的情况下解析HTML。

它具有 InnerText 属性。






要回答您的问题,您需要等待浏览器解析文本。






顺便说一下,您应该使用 WebClient 类而不是 WebRequest


I want to get plain text using WebRequest class, just like what we get when we use webbrowser1.Document.Body.InnerText . I have tried the following code

public string request_Resource()
{
   HttpWebRequest request = (HttpWebRequest)WebRequest.Create(myurl);
   Stream stream = request.GetResponse().GetResponseStream();
   StreamReader sr = new StreamReader(stream);
   WebBrowser wb = new WebBrowser();
   wb.DocumentText = sr.ReadToEnd();
   return wb.Document.Body.InnerText;
}

when i execute this is get a NullReferenceException.

Is there a better way to get a plain text.

Note: I cannot use webbrowser control directly to load the webpage, because, i don't want to deal with all those events that fire up multiple times when ever a page is loaded.

UPDATE: I have changed my code to use WebClient Class instead of WebRequest upon suggestion My code looks something like this now

public string request_Resource()
{
   WebClient wc = new WebClient();
   wc.Proxy = null;
   //The user agent header is added to avoid any possible errors
   wc.Headers.Add("user-agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.10) Gecko/20100914 Firefox/3.6.10 ( .NET CLR 3.5.30729; .NET4.0C)");
   return wc.DownloadString(myurl);
}

I am considering using HTML Utility Pack, can anyone suggest any better alternative.

解决方案

You're looking for the HTML Agility Pack, which can parse the HTML without IE.
It has an InnerText property.


To answer your question, you need to wait for the browser to parse the text.


By the way, you should use the WebClient class instead of WebRequest.

这篇关于如何从C#中WebRequest类的响应中获取纯文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆