使用.NET C#在网页中阅读网页源代码 [英] Read web page source code in a web page by using .NET C#

查看:133
本文介绍了使用.NET C#在网页中阅读网页源代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述





我尝试按照以下代码阅读网页,它有效。问题是网页有两个部分,主体和答案窗格,以下代码仅报告主体源代码,并在答案窗格部分中显示错误消息您的浏览器不支持iframe。但是,在激活的网页中,单击答案窗格部分,然后手动查看源。源代码报告是正确的。在C#中,如何在调用webClent DownloadString()获取正确的源文本之前控制网页并关注答案窗格部分。



感谢您的回复和时间。



RQ



使用System.Windows.Forms;

使用System.Net

...

System.Net.WebClient wc = new System.Net.WebClient();

string webData = wc.DownloadString(http://start.csail.mit.edu/answer.php?query=Who+is+the+41th+president+in+USA);



我尝试了什么:



我搜索互联网,尝试不同的代码,但没有运气。

解决方案

试试这个



  private   static   string  ReadsourceCode( string  Url)
{
string data = ;
HttpWebRequest request =(HttpWebRequest)WebRequest.Create(Url);
HttpWebResponse response =(HttpWebResponse)request.GetResponse();

if (response.StatusCode == HttpStatusCode.OK)
{
Stream receiveStream = response.GetResponseStream() ;
StreamReader readStream = null ;

if (response.CharacterSet == null
{
readStream = new StreamReader(receiveStream);
}
else
{
readStream = new StreamReader(receiveStream,Encoding.GetEncoding(response.CharacterSet));
}

data = readStream.ReadToEnd();

response.Close();
readStream.Close();
}
返回数据;
}



这样的调用

  var  source = ReadsourceCode(  http://start.csail.mit.edu/answer.php?query =谁+是+ +第41 + +总统+在+美国); 


iframe在页面中显示为页面你的浏览器,这是一个让两个页面看起来像一个的视觉技巧。你必须解析你下载的页面的html以找到iframe,然后阅读src元素(你可以使用敏捷包,或者只是纯文本操作或正则表达式来完成所有这些)。然后,您需要像在初始页面一样向src中的页面发出第二个请求,您可能需要将域名附加到其中(http://start.csail.mit.edu/justanswer.php?查询= ....)。

Hi,

I try to read a web page as follow in code, it works. The problem is that the web page has two section, main body and answer pane, The following code is only report the main body source code, and it has a error message "Your browser doesn't support iframes" in the answer pane section. However, in the activated web page, click on the answer pane section, and manually review source. The source code report is correct. In C#, How can I control the webpage and focus on the answer pane section before call webClent DownloadString() to get right source text.

It is appreciated for your reply and time.

RQ

using System.Windows.Forms;
using System.Net
...
System.Net.WebClient wc = new System.Net.WebClient();
string webData = wc.DownloadString("http://start.csail.mit.edu/answer.php?query=Who+is+the+41th+president+in+USA");

What I have tried:

I search internet, and try different code, but no luck.

解决方案

Try this

private static string ReadsourceCode(string Url)
{
string data="";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(Url);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();

if (response.StatusCode == HttpStatusCode.OK)
{
  Stream receiveStream = response.GetResponseStream();
  StreamReader readStream = null;

  if (response.CharacterSet == null)
  {
     readStream = new StreamReader(receiveStream);
  }
  else
  {
     readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
  }

  data = readStream.ReadToEnd();

  response.Close();
  readStream.Close();
}
return data; 
}


call like this

var source =ReadsourceCode("http://start.csail.mit.edu/answer.php?query=Who+is+the+41th+president+in+USA");


An iframe is shown as a page within a page by your browser, it is a visual trick to make two pages look like one. You'll have to parse the html of the page you've downloaded to find the iframe, then read the "src" element (you can do all this using agility pack, or just plain text manipulation or regex). You then need to issue a second request to the page in src just as you did your initial page, you'll probably have to append the domain name to it (http://start.csail.mit.edu/justanswer.php?query=....).


这篇关于使用.NET C#在网页中阅读网页源代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆