简单的dom解析器插入到数据库中 [英] Simple dom parser insert into database

查看：120 发布时间：2017/3/18 20:34:37 php database parsing simple-html-dom domparser

本文介绍了简单的dom解析器插入到数据库中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想在我的数据库中插入一些元素，但我想要$ pavadinimas和％kaina在一行，没有不同。此外，它将是非常酷，如果我可以生成我的元素在所有页面从网站，但然后我插入多于2个链接，我从刷新我的网页的错误，该页面无法加载。这里是我的代码。感谢帮助！

i want to insert some elements in my database, but i want that $pavadinimas and %kaina be in one line, not different. Moreover it will be pretty cool if i could generate my elements in all pages from website, but then I insert more than 2 links i get error from refreshing my web that page could not load. Here is my code. Thanks for help!

<?php // example of how to modify HTML contents


include_once('simple_html_dom.php');

// Create DOM from URL or file

$html = file_get_html('https://www.varle.lt/mobilieji-telefonai/');

foreach($html->find('span[class=inner]') as $pavadinimas) {
    $pavadinimas = str_replace("<span class=", " ", $pavadinimas);
    $pavadinimas = str_replace("inner>", " ", $pavadinimas);
    $pavadinimas = str_replace("<span>", " ", $pavadinimas);
    $pavadinimas = str_replace("</span></span>", " ", $pavadinimas);
    $pavadinimas = str_replace('"inner">   ', " ", $pavadinimas);
}

foreach($html->find('span[class=price]') as $kaina) {
    $kaina = str_replace("Lt", " ", $kaina);
    $kaina = str_replace("<span class=", " ", $kaina);
    $kaina = str_replace("price", " ", $kaina);
    $kaina = str_replace("</span>", " ", $kaina);
    $kaina = str_replace(",<sup>99</sup>", " ", $kaina);
    $kaina = str_replace(",<sup>99</sup>", " ", $kaina);
    $kaina = str_replace("               ", " ", $kaina);
    $kaina = str_replace('" ">', " ", $kaina);
    $kaina = str_replace("              ", " ", $kaina);
    $query = "insert into telefonai (pavadinimas,kaina) VALUES (?,?)";
    $this->db->query($query, array($pavadinimas,$kaina));
}
?>

推荐答案

继续一步一步...

首先从一个页面获取所有想要的信息（例如第一个）...想法是：

Start by getting all the wanted info from one page (the 1st for example)... The idea is to:

获取所有手机块： $ phones = $ html-> find（'a [data-id]'）;

在一个循环中，从每个块获取想要的信息（名称，价格）

在db中插入这些信息

现在，您已将代码用于一个页面，让我们尝试使它适用于所有页面，知道：

Now that you have the code working for one page, let's try to make it work for all pages knowing that:

所有页面都具有相同的结构，方法/代码

下一步按钮包含下一个要抓取的网页的链接，因此我们会在找不到链接

All pages have the same structure, so we can extract data with the same method/code above
The link of the next page to scrape is included in the Next button, so we'll stop when this link cannot be found

这里有一个代码总结了我们上面说的所有内容：

So here's a code summarizing all what we said above:

$url = "https://www.varle.lt/mobilieji-telefonai/";

// Start from the main page
$nextLink = $url;

// Loop on each next Link as long as it exsists
while ($nextLink) {
    echo "<hr>nextLink: $nextLink<br>";
    //Create a DOM object
    $html = new simple_html_dom();
    // Load HTML from a url
    $html->load_file($nextLink);

    /////////////////////////////////////////////////////////////
    /// Get phone blocks and extract info (also insert to db) ///
    /////////////////////////////////////////////////////////////
    $phones = $html->find('a[data-id]');

    foreach($phones as $phone) {
        // Get the link
        $linkas = $phone->href;

        // Get the name
        $pavadinimas = $phone->find('span[class=inner]', 0)->plaintext;

        // Get the name price and extract the useful part using regex
        $kaina = $phone->find('span[class=price]', 0)->plaintext;
        // This captures the integer part of decimal numbers: In "123,45" will capture "123"... Use @([\d,]+),?@ to capture the decimal part too
        preg_match('@(\d+),?@', $kaina, $matches);
        $kaina = $matches[1];

        echo $pavadinimas, " #----# ", $kaina, " #----# ", $linkas, "<br>";

        // INSERT INTO DB HERE
        // CODE
        // ...
    }
    /////////////////////////////////////////////////////////////
    /////////////////////////////////////////////////////////////

    // Extract the next link, if not found return NULL
    $nextLink = ( ($temp = $html->find('div.pagination a[class="next"]', 0)) ? "https://www.varle.lt".$temp->href : NULL );

    // Clear DOM object
    $html->clear();
    unset($html);
}

输出

nextLink: https://www.varle.lt/mobilieji-telefonai/
Samsung Phone I9300 Galaxy SIII Juodas #----# 1099 #----# https://www.varle.lt/mobilieji-telefonai/samsung-phone-i9300-galaxy-siii-juodas.html
Samsung Galaxy S2 Plus I9105 Pilkai mėlynas #----# 739 #----# https://www.varle.lt/mobilieji-telefonai/samsung-galaxy-s2-plus-i9105-pilkai-melynas.html
Samsung Phone S7562 Galaxy S Duos baltas #----# 555 #----# https://www.varle.lt/mobilieji-telefonai/samsung-phone-s7562-galaxy-s-duos-baltas--457135.html
...

nextLink: https://www.varle.lt/mobilieji-telefonai/?p=2
LG T375 Mobile Phone Black #----# 218 #----# https://www.varle.lt/mobilieji-telefonai/lg-t375-mobile-phone-black.html
Samsung S6802 Galaxy Ace Duos black #----# 579 #----# https://www.varle.lt/mobilieji-telefonai/samsung-s6802-galaxy-ace-duos-black.html
Mobilus telefonas Samsung Galaxy Ace Onyx Black | S5830 #----# 559 #----# https://www.varle.lt/mobilieji-telefonai/mobilus-telefonas-samsung-galaxy-ace-onyx-black.html
...

...
...

工作DEMO

注意代码可能需要一段时间来解析所有页面，因此php可能会返回此错误致命错误：最长执行时间超过30秒... 。然后，简单地扩展最大执行时间如下：


Notice that the code may take a while to parse all the pages, so php may return this error Fatal error: Maximum execution time of 30 seconds exceeded .... Then, simply extend the maximum execution time like this:
ini_set('max_execution_time', 300); //300 seconds = 5 minutes


                        这篇关于简单的dom解析器插入到数据库中的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

简单的dom解析器插入到数据库中 [英] Simple dom parser insert into database

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

简单的dom解析器插入到数据库中 [英] Simple dom parser insert into database

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭