用PHP多线程处理for语句 [英] Multi-threading a for statement with php

查看:69
本文介绍了用PHP多线程处理for语句的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用以下功能来检查图像是否位于其位置.每次脚本运行时,它都会加载大约40-50个网址,因此加载页面需要花费很长时间.我当时在考虑为"for语句"使用线程(在脚本末尾),但是找不到很多有关如何执行此操作的示例.我对php的多线程不是很熟悉,但是我在这里找到了使用popen的示例.

I'm using this following function to check if images exist at their location. Each time the script runs it load about 40 - 50 urls and so its taking long time to load the page. I was thinking of using threading for the "for statement" (at the end of the script) but couldn't find many examples on how to do that. I'm not very familiar with multi-threading with php but i found an example here using popen.

我的脚本:

function get_image_dim($sURL) {

  try {
    $hSock = @ fopen($sURL, 'rb');
    if ($hSock) {
      while(!feof($hSock)) {
        $vData = fread($hSock, 300);
        break;
      }
      fclose($hSock);
      if (strpos(' ' . $vData, 'JFIF')>0) {
        $vData = substr($vData, 0, 300);
        $asResult = unpack('H*',$vData);        
        $sBytes = $asResult[1];
        $width = 0;
        $height = 0;
        $hex_width = '';
        $hex_height = '';
        if (strstr($sBytes, 'ffc2')) {
          $hex_height = substr($sBytes, strpos($sBytes, 'ffc2') + 10, 4);
          $hex_width = substr($sBytes, strpos($sBytes, 'ffc2') + 14, 4);
        } else {
          $hex_height = substr($sBytes, strpos($sBytes, 'ffc0') + 10, 4);
          $hex_width = substr($sBytes, strpos($sBytes, 'ffc0') + 14, 4);
        }
        $width = hexdec($hex_width);
        $height = hexdec($hex_height);
        return array('width' => $width, 'height' => $height);
      } elseif (strpos(' ' . $vData, 'GIF')>0) {
        $vData = substr($vData, 0, 300);
        $asResult = unpack('h*',$vData);
        $sBytes = $asResult[1];
        $sBytesH = substr($sBytes, 16, 4);
        $height = hexdec(strrev($sBytesH));
        $sBytesW = substr($sBytes, 12, 4);
        $width = hexdec(strrev($sBytesW));
        return array('width' => $width, 'height' => $height);
      } elseif (strpos(' ' . $vData, 'PNG')>0) {
        $vDataH = substr($vData, 22, 4);
        $asResult = unpack('n',$vDataH);
        $height = $asResult[1];        
        $vDataW = substr($vData, 18, 4);
        $asResult = unpack('n',$vDataW);
        $width = $asResult[1];        
        return array('width' => $width, 'height' => $height);
      }
    }
  } catch (Exception $e) {}
  return FALSE;
}

for($y=0;$y<= ($image_count-1);$y++){
$dim = get_image_dim($images[$y]);
    if (empty($dim)) {
    echo $images[$y];
    unset($images[$y]);
    }
}
$images = array_values($images);

我发现的popen示例是:

The popen example i found was:

for ($i=0; $i<10; $i++) {
    // open ten processes
    for ($j=0; $j<10; $j++) {
        $pipe[$j] = popen('script.php', 'w');
    }

    // wait for them to finish
    for ($j=0; $j<10; ++$j) {
        pclose($pipe[$j]);
    }
}

我不确定我的代码的哪一部分必须放在script.php中?我尝试移动整个脚本,但这没用吗?

I'm not sure which part of my code has to go in the script.php? I tried moving the whole script but that didn't work?

关于如何实现此方法或是否有更好的多线程方法的任何想法?谢谢.

Any ideas on how can i implement this or if there is a better way to multi thread it? Thanks.

推荐答案

PHP本身没有多线程.您可以使用pthreads来做到这一点,但是在那里有一点经验,我可以肯定地说,这对于您的需求来说太过分了.

PHP does not have multi-threading natively. You can do it with pthreads, but having a little experience there, I can say with assurance that that is too much for your needs.

您最好的选择是使用curl,您可以使用curl_multi_init发起多个请求.根据PHP.net上的示例,以下内容可以满足您的需求:

Your best bet will be to use curl, you can initiate multiple requests with curl_multi_init. Based off the example on PHP.net, the following may work for your needs:

function curl_multi_callback(Array $urls, $callback, $cache_dir = NULL, $age = 600) {
    $return = array();

    $conn = array();

    $max_age = time()-intval($age);

    $mh = curl_multi_init();

    if(is_dir($cache_dir)) {
        foreach($urls as $i => $url) {
            $cache_path = $cache_dir.DIRECTORY_SEPARATOR.sha1($url).'.ser';
            if(file_exists($cache_path)) {
                $stat = stat($cache_path);
                if($stat['atime'] > $max_age) {
                    $return[$i] = unserialize(file_get_contents($cache_path));
                    unset($urls[$i]);
                } else {
                    unlink($cache_path);
                }
            }
        }
    }

    foreach ($urls as $i => $url) {
        $conn[$i] = curl_init($url);
        curl_setopt($conn[$i], CURLOPT_RETURNTRANSFER, 1);
        curl_multi_add_handle($mh, $conn[$i]);
    }

    do {
        $status = curl_multi_exec($mh, $active);
        // Keep attempting to get info so long as we get info
        while (($info = curl_multi_info_read($mh)) !== FALSE) {
            // We received information from Multi
            if (false !== $info) {
                //  The connection was successful
                $handle = $info['handle'];
                // Find the index of the connection in `conn`
                $i = array_search($handle, $conn);
                if($info['result'] === CURLE_OK) {
                    // If we found an index and that index is set in the `urls` array
                    if(false !== $i && isset($urls[$i])) {
                        $content = curl_multi_getcontent($handle);
                        $return[$i] = $data = array(
                            'url'     => $urls[$i],
                            'content' => $content,
                            'parsed'  => call_user_func($callback, $content, $urls[$i]),
                        );
                        if(is_dir($cache_dir)) {
                            file_put_contents($cache_dir.DIRECTORY_SEPARATOR.sha1($urls[$i]).'.ser', serialize($data));
                        }
                    }
                } else {
                    // Handle failures how you will
                }
                // Close, even if a failure
                curl_multi_remove_handle($mh, $handle);
                unset($conn[$i]);
            }
        }
    } while ($status === CURLM_CALL_MULTI_PERFORM || $active);


    // Cleanup and resolve any remaining connections (unlikely)
    if(!empty($conn)) {
        foreach ($conn as $i => $handle) {
            if(isset($urls[$i])) {
                $content = curl_multi_getcontent($handle);
                $return[$i] = $data = array(
                    'url'     => $urls[$i],
                    'content' => $content,
                    'parsed'  => call_user_func($callback, $content, $urls[$i]),
                );
                if(is_dir($cache_dir)) {
                    file_put_contents($cache_dir.DIRECTORY_SEPARATOR.sha1($urls[$i]).'.ser', serialize($data));
                }
            }
            curl_multi_remove_handle($mh, $handle);
            unset($conn[$i]);
        }
    }

    curl_multi_close($mh);

    return $return;
}

$return = curl_multi_callback($urls, function($data, $url) {
    echo "got $url\n";
    return array('some stuff');
}, '/tmp', 30);

//print_r($return);
/*
$url_dims = array(
    'url'     => 'http://www......',
    'content' => raw content
    'parsed'  => return of get_image_dim
)
*/

只需重构原始函数get_image_dim即可使用原始数据并输出所需的内容.

Just restructure your original function get_image_dim to consume the raw data and output whatever you are looking for.

这不是一个完整的功能,可能有错误或需要解决的特质,但这应该是一个很好的起点.

This is not a complete function, there may be errors, or idiosyncrasies you need to resolve, but it should serve as a good starting point.

已更新以包括缓存.这将我在18个URL上运行的测试从1秒更改为.007秒(带有缓存命中).

Updated to include caching. This changed a test I was running on 18 URLS from 1 second, to .007 seconds (with cache hits).

注意:您可能不希望像我一样缓存全部请求内容,而只是缓存url和已解析的数据.

Note: you may want to not cache the full request contents, as I did, and just cache the url and the parsed data.

这篇关于用PHP多线程处理for语句的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆