获取网页数据+ C# [英] Get webpage data + C#
问题描述
大家好!
我正试图从网页上获取数据,但不幸的是我无法做到这一点!我已经尝试了2个小时而且我不能这样做...
我不想获取HTML数据,因为我已经看到了所有示例描述了这种能力。
您是否知道如何从网页获取纯文本,例如来自http://www.onet.pl,我会喜欢接受例如:wiadomości,biznes,sport和更多纯文本。我对html不感兴趣!
我想做一些像ctrl + a(标记所有页面)并复制到我的程序并从中获取纯复制文本网页??
请帮助我!
祝你好运
好的,谢谢,你能告诉我如何在网页中以编程方式选择CTRL + A fox示例并将其复制到C#语言的剪贴板中?
Hello all !
I'm trying to get data from webpage but unfortunately I'm not able to do this !!! I've been trying for 2 hours and I can't do it...
I don't want to get html data, owing to I have seen all examples describes that ability.
Do You have any idea how to get only pure text from webpage such like from http://www.onet.pl, and I would like to receive for instance : "wiadomości, biznes, sport" and many more pure text. I'm not interested in html !
I would like to do something like ctrl+a ( mark all page ) and copy to my program and get pure copied text from webpage ??
Please, help me !
Best regards
Ok thanks, could You tell me how would I programatically select CTRL+A fox example in webpage and copy this to clipboard in C# language ??
推荐答案
您可以使用类System.Net.HttpWebRequest
和System.Net.HttpWebResponse
,参见:
http://msdn.microsoft.com/en-us /library/system.net.webrequest.aspx [ ^ ](这里有一些HttpWebRequest
用法示例),
http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.aspx [ ^ ],
http://msdn.microsoft.com/en-us/library/system.net.httpwebresponse.aspx [ ^ ]。
您可以查看我在CodeProject提供的应用程序HttpDownloader的完整代码,以获取完整的代码示例:如何从互联网上下载文件 [ ^ ]。
-SA
You can use the classesSystem.Net.HttpWebRequest
andSystem.Net.HttpWebResponse
, see:
http://msdn.microsoft.com/en-us/library/system.net.webrequest.aspx[^] (someHttpWebRequest
usage sample here),
http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.aspx[^],
http://msdn.microsoft.com/en-us/library/system.net.httpwebresponse.aspx[^].
You can look at the complete code of my application HttpDownloader I provided here at CodeProject for complete code sample: how to download a file from internet[^].
—SA
网站ar用HTML编写。
如果你想要HTML中的文本你必须解析它,例如使用Html Agility Pack,它为每个节点提供一个InnerText属性,它只提取文本而不提供任何文本标记。
但请记住,布局也是标记 - 大多数网站的纯文字版本看起来不太好......
前面的解决方案显示了如何使用System获取HTML。
Websites are written in HTML.
If you want the text inside the HTML you have to parse it, for example with Html Agility Pack, which offers for each node a InnerText-property which extracts only the text without any markup.
But keep in mind that layout is also markup - the text-only versions of the most websites do not look very good...
The previous solution shows how you can obtain the HTML.
using System;
using System.IO;
using System.Net;
using System.Text;
/// <summary>
/// Fetches a Web Page
/// </summary>
class WebFetch
{
static void Main(string[] args)
{
// used to build entire input
StringBuilder sb = new StringBuilder();
// used on each read operation
byte[] buf = new byte[8192];
// prepare the web page we will be asking for
HttpWebRequest request = (HttpWebRequest)
WebRequest.Create("http://www.mayosoftware.com");
// execute the request
HttpWebResponse response = (HttpWebResponse)
request.GetResponse();
// we will read data via the response stream
Stream resStream = response.GetResponseStream();
string tempString = null;
int count = 0;
do
{
// fill the buffer with data
count = resStream.Read(buf, 0, buf.Length);
// make sure we read some data
if (count != 0)
{
// translate from bytes to ASCII text
tempString = Encoding.ASCII.GetString(buf, 0, count);
// continue building the string
sb.Append(tempString);
}
}
while (count > 0); // any more data to read?
// print out page source
Console.WriteLine(sb.ToString());
}
}
这篇关于获取网页数据+ C#的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!