从网站C＃获取HTML代码 [英] Get HTML code from a website C#

查看：76 发布时间：2019/6/11 21:11:02 C# HTML LINQ

本文介绍了从网站C＃获取HTML代码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

大家好，

我正在制作新闻提供者，我需要从网站上获取HTML代码，保存它并通过LINQ表达式查找文本。

我希望你们中的一些人可以帮我完成这项艰巨任务。

我是使用此代码查找网页来源：

Hi people,

I'm making a provider of news and I need to get a HTML code from the website, save it and find text by a LINQ expression.
I hope some of you can help me with this hard task.

i'm using this code to find the webpage source:

public static String code(string Url)
    {
        
            HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(Url);
            myRequest.Method = "GET";
            WebResponse myResponse = myRequest.GetResponse();
            StreamReader sr = new StreamReader(myResponse.GetResponseStream(), System.Text.Encoding.UTF8);
            string result = sr.ReadToEnd();
            sr.Close();
            myResponse.Close();
            
            return result;
     }

现在我想在网页来源的div中找到文字。

但我不知道怎么做

now I want to find text in a div of the webpage source.
but i don't know how to do it

推荐答案

你从网站上获取HTML代码。你可以使用这样的代码。

You getting HTML code from a website. You can use code like this.

string urlAddress = "http://google.com";

HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
  Stream receiveStream = response.GetResponseStream();
  StreamReader readStream = null;
  if (response.CharacterSet == null)
    readStream = new StreamReader(receiveStream);
  else
    readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
  string data = readStream.ReadToEnd();
  response.Close();
  readStream.Close();
}

这将为您提供网站上返回的HTML代码。但是通过LINQ查找文本并不那么容易。

也许使用正则表达式会更好但是HTML代码不能很好。

事件更好的是让你的 RSS Feed 的新闻[ ^ ]。

除了Kim建议的内容外，我还会提出一些进一步的建议。

如果你使用RSS feed，很可能是这个格式良好的XML。使用以下方法之一解析XML并找到所需的元素：

In addition to what Kim suggested, I would advice some further steps.

If you use RSS feed, chances are, this well-formed XML. Parse XML in one of the following ways and locate the elements you need:

使用 System.Xml.XmlDocument class。它实现了DOM接口;如果文档的大小不是太大，这种方式是最简单和最好的。
参见 http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.aspx [ ^ ]。
使用类 System.Xml.XmlTextReader ;这是最快的阅读方式，特别是你需要跳过一些数据。
参见 http://msdn.microsoft.com/en-us/library/system.xml.xmlreader.aspx [ ^ ]。
使用类 System.Xml.Linq.XDocument ;这是类似于 XmlDocument 的最合适的方式，支持LINQ to XML Programming。
参见 http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.aspx [ ^ ]，http://msdn.microsoft.com/en-us/library/bb387063.aspx [ ^ ]。

Use System.Xml.XmlDocument class. It implements DOM interface; this way is the easiest and good enough if the size if the document is not too big.
See http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.aspx[^].
Use the class System.Xml.XmlTextReader; this is the fastest way of reading, especially is you need to skip some data.
See http://msdn.microsoft.com/en-us/library/system.xml.xmlreader.aspx[^].
Use the class System.Xml.Linq.XDocument; this is the most adequate way similar to that of XmlDocument, supporting LINQ to XML Programming.
See http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.aspx[^], http://msdn.microsoft.com/en-us/library/bb387063.aspx[^].

如果由于某种原因你需要在HTML中找到没有格式化为格式良好的XML的东西（这可能会很遗憾），请尝试使用一些HTML解析器。

例如，查看Majestic-12开源HTML解析器： http：// www.majestic12.co.uk/projects/html_parser.php [< a href =http://www.majestic12.co.uk/projects/html_parser.phptarget =_ blanktitle =New Window> ^ ]。

-SA

查看我的 WebResourceProvider [ ^ ]框架旨在完成您正在执行的确切任务。

/ ravi

See my WebResourceProvider[^] framework that was designed to do the exact task you're doing.

/ravi

这篇关于从网站C＃获取HTML代码的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从网站C＃获取HTML代码 [英] Get HTML code from a website C#

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

从网站C＃获取HTML代码 [英] Get HTML code from a website C#

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭