抓取/从多个页面使用php preg_match_all&卷曲 [英] Grab/download images from multiple pages using php preg_match_all & cURL

查看:138
本文介绍了抓取/从多个页面使用php preg_match_all&卷曲的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我试图从另一个网站抓取一些图片,问题是每个图片都在不同的页面上

So I'm trying to grab some images from another site, the problem is each image is on a different page

IE:id / 1,id / 2 ,id / 3等etc

IE: id/1, id/2, id/3 etc etc

到目前为止,我有下面的代码,可以从单个URL给出的图像使用:

so far I have the code below which can grab an image from the single URL given using:

$returned_content = get_data('http://somedomain.com/id/1/');

但需要使上面的行成为一个数组(我猜),所以它会抓取图像页面1然后继续抓取第2页的下一个图片,然后第3页etc等自动

but need to make the line above become an array (I guess) so it will grab the image from page 1 then go on to grab the next image on page 2 then page 3 etc etc automatically

function get_data($url){
 $ch = curl_init();
 $timeout = 5;
  curl_setopt($ch,CURLOPT_URL,$url);
  curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
  curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
  curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
 $data = curl_exec($ch);
  curl_close($ch);
 return $data;
}

$returned_content = get_data('http://somedomain.com/id/1/');

if (preg_match_all("~http://somedomain.com/images/(.*?)\.jpg~i", $returned_content, $matches)) {

$src = 0;
      foreach ($matches[1] as $key) {

if(++$src > 1) break;

          $out = $key;
      }

      $file = 'http://somedomain.com/images/' . $out . '.jpg';


$dir = 'photos'; 

$imgurl = get_data($file);

file_put_contents($dir . '/' . $out . '.jpg', $imgurl);

echo  'done';
}

一如往常,所有帮助都感谢并提前感谢。

As always all help is appreciated and thanks in advance.

推荐答案

这很混乱,因为它听起来像是你只是每页保存一个图像感兴趣。但是代码使它看起来像你实际上试图保存每个页面上的每个图像。

This was pretty confusing, because it sounded like you were only interested in saving one image per page. But then the code makes it look like you're actually trying to save every image on each page. So it's entirely possible I completely misunderstood... But here goes.

在每个页面上循环并不是那么困难:

Looping over each page isn't that difficult:

$i = 1;
$l = 101;

while ($i < $l) {
    $html = get_data('http://somedomain.com/id/'.$i.'/');
    getImages($html);
    $i += 1;
}

下面假设您试图保存 all 该特定页面上的图片:

The following then assumes that you're trying to save all the images on that particular page:

function getImages($html) {
    $matches = array();
    $regex = '~http://somedomain.com/images/(.*?)\.jpg~i';
    preg_match_all($regex, $html, $matches);
    foreach ($matches[1] as $img) {
        saveImg($img);
    }
}

function saveImg($name) {
    $url = 'http://somedomain.com/images/'.$name.'.jpg';
    $data = get_data($url);
    file_put_contents('photos/'.$name.'.jpg', $data);
}

这篇关于抓取/从多个页面使用php preg_match_all&amp;卷曲的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆