查看网页来源(.aspx页面) [英] View source of web page (.aspx pages)

查看:93
本文介绍了查看网页来源(.aspx页面)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好

我想废弃这个页面:

http://www.webhostdir.com/search/profile.aspx?spid=19137 [ ^ ]



使用类似这样的代码

  Dim  myRequest 作为 HttpWebRequest =  DirectCast (WebRequest.Create(  http://www.webhostdir.com/search/ profile.aspx?spid = 19137),HttpWebRequest)
myRequest.Method = GET
myRequest.KeepAlive = False
Dim webresponse 作为 HttpWebResponse
尝试
webresponse = DirectCast (myRequest.GetResponse(),HttpWebResponse)
< span class =code-keyword> Dim enc As 编码= System.Text.Encoding.GetEncoding( 1252
Dim loResponseStream As StreamReader(webresponse.GetResponseStream(),enc)
Dim r As String = loResponseStream.ReadToEnd()
My.Computer.FileSystem.WriteAllText( < span class =code-string> C:\final.txt,r, True
loResponseStream.Close()
webresponse.Close()
Catch
结束 尝试





但这不起作用,当我手动下载页面时,它显示我54Kb大小,当我翻录它时,它只显示14Kb文件。



需要帮助。



谢谢



这是在线服务这是根据我的需要抓住的。可能有人帮助我解决他们的破坏逻辑

http://www.ex-designz.net/htmlviewer.asp

解决方案

我发现的还不是一个完整的答案,但它可能会帮助你解决这个问题。



我尝试使用我自己的HTTP下载程序并完全相同相同的结果。

但我也比较了保存的文件并看到了一个很大的区别:隐藏的输入元素名为__VIEWSTATE:



< input type =hiddenname =__ VIEWSTATEid =__ VIEWSTATEvalue =...我在这里跳过了内容....../> 





我没有显示属性值的内容 - 它很长。

所以,这是差异:至少在一个案例中这个如果使用Web浏览器,则值会更长。应用程序使用隐藏元素来保存视图状态,这是已知的方法。



我还不知道请求有多么不同。也许你可以搞清楚这一点。可以窥探HTTP以获取Web浏览器发送的内容,详细信息。



-SA


< blockquote>这是一个实用程序 wget [ ^ ] - 执行所需的操作。您可以使用流程 [ ^ ]类。虽然wget是开源的,但它不是用c#编写的。



这是一个简单的解决方案,它可以让你获得网站上的任何可用内容。



问候

Espen Harlinn


看看这篇文章。



< a href =http://www.4guysfromrolla.com/articles/122204-1.aspx#postadlink> http://www.4guysfromrolla.com/articles/122204-1.aspx#postadlink [< a href =http://www.4guysfromrolla.com/articles/122204-1.aspx#postadlinktarget =_ blanktitle =New Window> ^ ]



我不确定WebClient类是否会在这种情况下帮助你。如果你还没有尝试过,请看看这个。



http://www.4guysfromrolla.com/webtech/070601-1.shtml [ ^ ]


Hi to all
I am trying to scrap this page:
http://www.webhostdir.com/search/profile.aspx?spid=19137[^]

Using code something like this

Dim myRequest As HttpWebRequest = DirectCast(WebRequest.Create("http://www.webhostdir.com/search/profile.aspx?spid=19137"), HttpWebRequest)
        myRequest.Method = "GET"
        myRequest.KeepAlive = False
        Dim webresponse As HttpWebResponse
        Try
            webresponse = DirectCast(myRequest.GetResponse(), HttpWebResponse)
            Dim enc As Encoding = System.Text.Encoding.GetEncoding(1252)
            Dim loResponseStream As New StreamReader(webresponse.GetResponseStream(), enc)
            Dim r As String = loResponseStream.ReadToEnd()
            My.Computer.FileSystem.WriteAllText("C:\final.txt", r, True)
            loResponseStream.Close()
            webresponse.Close()
        Catch
        End Try



But this is not working, when i manually download page it shows me 54Kb size and by method above when i rip it it only shows 14Kb file.

Need help.

Thanks

this is the online service which is grabbing according to my needs. could some one help me with the logic of their ripping
http://www.ex-designz.net/htmlviewer.asp

解决方案

What I found is not a complete answer yet, but it might help you to sort this out.

I tried the same using my own HTTP downloader and got exactly the same results.
But I also compared saved files and saw one big difference: there are hidden input elements with the name __VIEWSTATE:

<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="... I skipped the content here ... " />



I did not show the content of the attribute value — it's pretty long.
So, here is the difference: at least in one case this value is much longer if you use a Web browser. The application uses hidden elements to save the view state, which is the known method.

I don't know yet how requests are different though. Maybe you can figure this out. It's possible to spy on HTTP to get what the Web browser sends, verbosely.

—SA


Here is a utility wget[^] - that performs the required operations. You can execute it from your code using the Process[^] class. While wget is open source, it's not written in c#.

It's an easy solution to your problem, it will allow you to get just about anything available on the site.

Regards
Espen Harlinn


Have a look at this article.

http://www.4guysfromrolla.com/articles/122204-1.aspx#postadlink[^]

I am not sure if WebClient class will help you in this scenario.If you have not tried that,take a look at this too.

http://www.4guysfromrolla.com/webtech/070601-1.shtml[^]


这篇关于查看网页来源(.aspx页面)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆