获取网页数据+ C# [英] Get webpage data + C#

查看:60
本文介绍了获取网页数据+ C#的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好!



我正试图从网页上获取数据,但不幸的是我无法做到这一点!我已经尝试了2个小时而且我不能这样做...



我不想获取HTML数据,因为我已经看到了所有示例描述了这种能力。



您是否知道如何从网页获取纯文本,例如来自http://www.onet.pl,我会喜欢接受例如:wiadomości,biznes,sport和更多纯文本。我对html不感兴趣!



我想做一些像ctrl + a(标记所有页面)并复制到我的程序并从中获取纯复制文本网页??



请帮助我!



祝你好运



好​​的,谢谢,你能告诉我如何在网页中以编程方式选择CTRL + A fox示例并将其复制到C#语言的剪贴板中?

Hello all !

I'm trying to get data from webpage but unfortunately I'm not able to do this !!! I've been trying for 2 hours and I can't do it...

I don't want to get html data, owing to I have seen all examples describes that ability.

Do You have any idea how to get only pure text from webpage such like from http://www.onet.pl, and I would like to receive for instance : "wiadomości, biznes, sport" and many more pure text. I'm not interested in html !

I would like to do something like ctrl+a ( mark all page ) and copy to my program and get pure copied text from webpage ??

Please, help me !

Best regards

Ok thanks, could You tell me how would I programatically select CTRL+A fox example in webpage and copy this to clipboard in C# language ??

推荐答案

您可以使用类 System.Net.HttpWebRequest System.Net.HttpWebResponse ,参见:

http://msdn.microsoft.com/en-us /library/system.net.webrequest.aspx [ ^ ](这里有一些 HttpWebRequest 用法示例),

http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.aspx [ ^ ],

http://msdn.microsoft.com/en-us/library/system.net.httpwebresponse.aspx [ ^ ]。



您可以查看我在CodeProject提供的应用程序HttpDownloader的完整代码,以获取完整的代码示例:如何从互联网上下载文件 [ ^ ]。



-SA
You can use the classes System.Net.HttpWebRequest and System.Net.HttpWebResponse, see:
http://msdn.microsoft.com/en-us/library/system.net.webrequest.aspx[^] (some HttpWebRequest usage sample here),
http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.aspx[^],
http://msdn.microsoft.com/en-us/library/system.net.httpwebresponse.aspx[^].

You can look at the complete code of my application HttpDownloader I provided here at CodeProject for complete code sample: how to download a file from internet[^].

—SA


网站ar用HTML编写。

如果你想要HTML中的文本你必须解析它,例如使用Html Agility Pack,它为每个节点提供一个InnerText属性,它只提取文本而不提供任何文本标记。

但请记住,布局也是标记 - 大多数网站的纯文字版本看起来不太好......



前面的解决方案显示了如何使用System获取HTML。
Websites are written in HTML.
If you want the text inside the HTML you have to parse it, for example with Html Agility Pack, which offers for each node a InnerText-property which extracts only the text without any markup.
But keep in mind that layout is also markup - the text-only versions of the most websites do not look very good...

The previous solution shows how you can obtain the HTML.


using System;
using System.IO;
using System.Net;
using System.Text;


/// <summary>
/// Fetches a Web Page
/// </summary>
class WebFetch
{
	static void Main(string[] args)
	{
		// used to build entire input
		StringBuilder sb  = new StringBuilder();

		// used on each read operation
		byte[]        buf = new byte[8192];

		// prepare the web page we will be asking for
		HttpWebRequest  request  = (HttpWebRequest)
			WebRequest.Create("http://www.mayosoftware.com");

		// execute the request
		HttpWebResponse response = (HttpWebResponse)
			request.GetResponse();

		// we will read data via the response stream
		Stream resStream = response.GetResponseStream();

		string tempString = null;
		int    count      = 0;

		do
		{
			// fill the buffer with data
			count = resStream.Read(buf, 0, buf.Length);

			// make sure we read some data
			if (count != 0)
			{
				// translate from bytes to ASCII text
				tempString = Encoding.ASCII.GetString(buf, 0, count);

				// continue building the string
				sb.Append(tempString);
			}
		}
		while (count > 0); // any more data to read?

		// print out page source
		Console.WriteLine(sb.ToString());
	}
}


这篇关于获取网页数据+ C#的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆