web使用分页抓取gridview [英] web scraping a gridview with paging

查看:72
本文介绍了web使用分页抓取gridview的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些问题,我必须屏幕抓取一个aspx网站的网页,其中有趣的数据是网格视图中的文本。我没有问题从网格视图中的第一页获取文本。



但是如何从第二页,第三页等获取文本



问题是网格视图中的每个页面都没有特定的URL,而是在我点击分页链接按钮时执行的java脚本。



例如,分页java脚本的java脚本是:

javascript:__ doPostBack('ctl00 $ cphmaincontent $ lbntNavigate2,3 ..最后一页','')



如何模拟这个?

I have some problems, I must screen scrape a web page that is an aspx-site where the interesting data is the text in a grid view. I have no problem to get the text from the first page in the grid view.

But how do I get the text from the second page, the third page etc.

The problem is that every page in the grid view has not a specific URL, instead it is a java script that is executing when I click on the paging link button.

For example the java script for a paging java script is:
javascript:__doPostBack('ctl00$cphmaincontent$lbntNavigate2,3..up last page','')

How do I simulate this?

推荐答案

cphmaincontent
cphmaincontent


lbntNavigate2,3..up最后一页','')



如何模拟这个?
lbntNavigate2,3..up last page','')

How do I simulate this?


请参阅我对该问题的评论......但是,如果这是以某种常规方式运行的分页,则可以设置HTTP间谍并查看每个分页事件发送的HTTP请求。显然,您不能为报废的网站假设任何特定的服务器端技术。因此,您需要了解它是如何工作的并模仿客户端站点的行为。



我使用的一种方法是一些HTTP间谍应用程序。例如,我使用作为SeaMonkey / Firefox插件创建的那个,它的名字是Http Fox,但我确信有类似的工具,适用于不同的浏览器和独立的浏览器。使用这样的工具,您可以很容易地找出正在发生的事情。此外,所有源代码的Javascript代码总是可以读取,你可以研究它。



我想强调的是,没有什么能保证所有网站100%成功。但是,您可能会找到最常用的最常用的案例类。例如,大多数具有网格视图的ASP.NET页面使用几乎相同的分页机制。



-SA
Please see my comment to the question… However, if this is paging which behaves in some regular way, you can set a HTTP spy and see what HTTP requests are sent on each paging event. Apparently, you cannot assume any particular server-side technology for the site being scrapped. So, you need to learn how it works and mimic the behavior of the client site.

One method I used is some HTTP spy application. I, for example, use the one created as a plug-in to SeaMonkey/Firefox, its name is Http Fox, but I know for sure that there are similar tools, for different browsers and stand-along one. Using such tool, you can pretty easy find out what's going on. Besides, all source Javascript code is always readable to you, you can study it.

I want to emphasize that nothing can guarantee 100% success for all sites. However, you will probably find out very general most typically used classes of cases. For example, most ASP.NET pages with grid view use pretty much the same paging mechanism.

—SA


这篇关于web使用分页抓取gridview的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆