如何使用PHP Simple HTML DOM解析器抓取动态数据 [英] How to scrape dynamic data with PHP Simple HTML DOM Parser

查看：123 发布时间：2019/11/14 17:22:55 php jquery html dom

本文介绍了如何使用PHP Simple HTML DOM解析器抓取动态数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

首先，我要说的是，我已经阅读了这里的许多报废"线程，但没有一个对我有帮助.我还检查了几天的互联网，现在我已经接近电线了，我希望有人能为我提供一些帮助.

first let me say that I have read over numerous "scrapping" threads on here and none have been of help to me. I also checked around the internet for days and now I am getting close to the wire I am hoping someone can shed some light on this for me.

我正在使用PHP Simple HTML DOM解析器从页面中抓取一些数据.我正在使用的url提供动态内容，但似乎无法进行任何工作来提取该内容.我需要将text(plain)从<tr id="0" class="ui-widget-content jqgrow ui-row-ltr" role="row">刮到<tr id="9" class="ui-widget-content jqgrow ui-row-ltr" role="row">，我觉得一旦获得工作，我可以得到其他人.因为此信息实际上在页面加载时不在页面上，而是在页面加载后进入折叠状态.

I am using PHP Simple HTML DOM Parser to scrape some data from a page. The url I am working with serves dynamic content and I can not seem to get anything to work to pull that content in. I need to scrape the text(plain) from <tr id="0" class="ui-widget-content jqgrow ui-row-ltr" role="row"> to <tr id="9" class="ui-widget-content jqgrow ui-row-ltr" role="row">, I feel like once I get one to work I can get the others. Because this info is not actually on the page when the page is loaded but rather comes into the fold after the page loads I am in a rutt.

话虽如此，这是我尝试过的:

With that said, here is what I have tried:

echo file_get_html('http://sheriffclevelandcounty.com/p2c/jailinmates.aspx')->plaintext;

上面的内容将向我显示所有需要的信息，例如:

The above will show me everything BUT the info I need, like this:

我还尝试了使用IMDb插件中的示例，并根据需要进行了修改，就是这样:

I also tried using the example from the plugin using IMDb and modified to my needs, this is it:

// Defining the basic cURL function
    function curl($url) {
        // Assigning cURL options to an array
        $options = Array(
            CURLOPT_RETURNTRANSFER => TRUE,  // Setting cURL's option to return the webpage data
            CURLOPT_FOLLOWLOCATION => TRUE,  // Setting cURL to follow 'location' HTTP headers
            CURLOPT_AUTOREFERER => TRUE, // Automatically set the referer where following 'location' HTTP headers
            CURLOPT_CONNECTTIMEOUT => 120,   // Setting the amount of time (in seconds) before the request times out
            CURLOPT_TIMEOUT => 120,  // Setting the maximum amount of time for cURL to execute queries
            CURLOPT_MAXREDIRS => 10, // Setting the maximum number of redirections to follow
            CURLOPT_USERAGENT => "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073000 Shredder/3.0a2pre ThunderBrowse/3.2.1.8",  // Setting the useragent
            CURLOPT_URL => $url, // Setting cURL's URL option with the $url variable passed into the function
        );

        $ch = curl_init();  // Initialising cURL
        curl_setopt_array($ch, $options);   // Setting cURL's options using the previously assigned array data in $options
        $data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
        curl_close($ch);    // Closing cURL
        return $data;   // Returning the data from the function
    }

     // Defining the basic scraping function
    function scrape_between($data, $start, $end){
        $data = stristr($data, $start); // Stripping all data from before $start
        $data = substr($data, strlen($start));  // Stripping $start
        $stop = stripos($data, $end);   // Getting the position of the $end of the data to scrape
        $data = substr($data, 0, $stop);    // Stripping all data from after and including the $end of the data to scrape
        return $data;   // Returning the scraped data from the function
    }

    $scraped_page = curl("http://sheriffclevelandcounty.com/p2c/jailinmates.aspx");    // Downloading IMDB home page to variable $scraped_page
    $scraped_data = scrape_between($scraped_page, '<table id="tblII" class="ui-jqgrid-btable" cellspacing="0" cellpadding="0" border="0" role="grid" aria-multiselectable="false" aria-labelledby="gbox_tblII" style="width: 456px;">', '</table>');   // Scraping downloaded dara in $scraped_page for content between <title> and </title> tags

    echo $scraped_data; // Echoing $scraped data, should show "The Internet Movie Database (IMDb)"

当然这些都不起作用，所以我的问题是:如何使用PHP Simple DOM解析器获取页面加载后加载的动态内容?有可能还是我完全走错了路?

Of course neither of these work, so my question is: How do I use the PHP Simple DOM Parser to get dynamic content that is loaded after page load? Is it possible or am I just completely on the wrong track here?

如何使用PHP Simple HTML DOM解析器抓取动态数据 [英] How to scrape dynamic data with PHP Simple HTML DOM Parser

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

如何使用PHP Simple HTML DOM解析器抓取动态数据 [英] How to scrape dynamic data with PHP Simple HTML DOM Parser

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭