屏幕延迟后刮网页 [英] Screen scraping web page after delay

查看：142 发布时间：2016/9/20 9:43:35 c# c#-4.0 screen-scraping web-scraping

本文介绍了屏幕延迟后刮网页的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想要刮使用C＃一个网页，在页面加载后然而，执行它加载更多的元素融入其中，我需要刮DOM一些JavaScript。一个标准的刮板只是劫掠加载页面的HTML和不拿起通过JavaScript所做的DOM变化。？如何我把某种功能等待一两秒钟，然后抓住源

I'm trying to scrape a web page using C#, however after the page loads, it executes some javascript which loads more elements into the DOM which I need to scrape. A standard scraper simply grabs the html of the page on load and doesn't pick up the DOM changes made via javascript. How do I put in some sort of functionality to wait for a second or two and then grab the source?

下面是我当前的代码：

private string ScrapeWebpage(string url, DateTime? updateDate)
        {
            HttpWebRequest request = null;
            HttpWebResponse response = null;
            Stream responseStream = null;
            StreamReader reader = null;
            string html = null;

            try
            {
                //create request (which supports http compression)
                request = (HttpWebRequest)WebRequest.Create(url);
                request.Pipelined = true;
                request.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip,deflate");
                if (updateDate != null)
                    request.IfModifiedSince = updateDate.Value;

                //get response.
                response = (HttpWebResponse)request.GetResponse();
                responseStream = response.GetResponseStream();
                if (response.ContentEncoding.ToLower().Contains("gzip"))
                    responseStream = new GZipStream(responseStream, CompressionMode.Decompress);
                else if (response.ContentEncoding.ToLower().Contains("deflate"))
                    responseStream = new DeflateStream(responseStream, CompressionMode.Decompress);

                //read html.
                reader = new StreamReader(responseStream, Encoding.Default);
                html = reader.ReadToEnd();
            }
            catch
            {
                throw;
            }
            finally
            {//dispose of objects.
                request = null;
                if (response != null)
                {
                    response.Close();
                    response = null;
                }
                if (responseStream != null)
                {
                    responseStream.Close();
                    responseStream.Dispose();
                }
                if (reader != null)
                {
                    reader.Close();
                    reader.Dispose();
                }
            }
            return html;
        }

下面是一个简单的网址：

Here is a sample url:

http://www.realtor.com/ realestateandhomes搜索/ geneva_ny＃listingType-任何/ PG-4

您会看到当它说发现有134上市的第一次加载页面，然后经过第二它说，发现187的属性。

You'll see when the page first loads it says 134 listings found, then after a second it says 187 properties found.

屏幕延迟后刮网页 [英] Screen scraping web page after delay

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

屏幕延迟后刮网页 [英] Screen scraping web page after delay

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭