如何从C#中WebRequest类的响应中获取纯文本 [英] How to get plaintext from the response of a WebRequest class in C#
问题描述
我想使用WebRequest类获取纯文本,就像我们在使用 webbrowser1.Document.Body.InnerText
时所得到的一样。我已经尝试了下面的代码:
$ b $ pre $ public string request_Resource()
{
HttpWebRequest request =(HttpWebRequest) WebRequest.Create(myurl);
Stream stream = request.GetResponse()。GetResponseStream();
StreamReader sr = new StreamReader(stream);
WebBrowser wb = new WebBrowser();
wb.DocumentText = sr.ReadToEnd();
返回wb.Document.Body.InnerText;
}
当我执行这个时,得到一个 NullReferenceException
。
有没有更好的方法来获取纯文本。
注意:I不能直接使用webbrowser控件来加载网页,因为我不想处理所有那些在加载页面时多次触发的事件。
更新:我已经改变了我的代码,使用WebClient类而不是WebRequest,建议
我的代码现在看起来像这样
<$ p $公共字符串request_Resource()
{
WebClient wc = new WebClient();
wc.Proxy = null;
//添加用户代理头以避免任何可能的错误
wc.Headers.Add(user-agent,Mozilla / 5.0(Windows; U; Windows NT 5.1; en-US; rv:1.9.2.10)Gecko / 20100914 Firefox / 3.6.10(.NET CLR 3.5.30729; .NET4.0C));
返回wc.DownloadString(myurl);
}
我正在考虑使用HTML实用程序包,任何人都可以提出更好的建议 HTML Agility Pack ,它可以在没有IE的情况下解析HTML。
它具有 InnerText
属性。
要回答您的问题,您需要等待浏览器解析文本。
顺便说一下,您应该使用 WebClient
类而不是 WebRequest
。
I want to get plain text using WebRequest class, just like what we get when we use webbrowser1.Document.Body.InnerText
. I have tried the following code
public string request_Resource()
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(myurl);
Stream stream = request.GetResponse().GetResponseStream();
StreamReader sr = new StreamReader(stream);
WebBrowser wb = new WebBrowser();
wb.DocumentText = sr.ReadToEnd();
return wb.Document.Body.InnerText;
}
when i execute this is get a NullReferenceException
.
Is there a better way to get a plain text.
Note: I cannot use webbrowser control directly to load the webpage, because, i don't want to deal with all those events that fire up multiple times when ever a page is loaded.
UPDATE: I have changed my code to use WebClient Class instead of WebRequest upon suggestion My code looks something like this now
public string request_Resource()
{
WebClient wc = new WebClient();
wc.Proxy = null;
//The user agent header is added to avoid any possible errors
wc.Headers.Add("user-agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.10) Gecko/20100914 Firefox/3.6.10 ( .NET CLR 3.5.30729; .NET4.0C)");
return wc.DownloadString(myurl);
}
I am considering using HTML Utility Pack, can anyone suggest any better alternative.
You're looking for the HTML Agility Pack, which can parse the HTML without IE.
It has an InnerText
property.
To answer your question, you need to wait for the browser to parse the text.
By the way, you should use the WebClient
class instead of WebRequest
.
这篇关于如何从C#中WebRequest类的响应中获取纯文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!