从网站批量下载图像的最佳方法 [英] Best method for bulk downloading images from website

查看:69
本文介绍了从网站批量下载图像的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将从网站上将很多图像(+20.000)下载到我的服务器上,并且由于要下载的图像太多,我试图找出最好的方法。

I will download a lot of images (+20.000) from a website to my server and i'm trying to figure out the best way to do this since there's so many images to download.

目前,我有下面的代码可以在测试中使用。但是,有更好的解决方案还是我应该使用某些软件来做到这一点?

Currently I have the code below which works in testing. But is there a better solution or should I use some software to do this?

foreach ($products as $product) {

$url = $product->img;
$imgName = $product->product_id
$path = "images/";

$img =  $path . $imgName . ".png";

file_put_contents($img, file_get_contents($url));

}

此外,我是否有可能破坏某些东西或当我一次下载这么多图像时使网站崩溃?

Also, is there a chance that I will break something or crash the website when I download that many images at once?

推荐答案

首先,我同意@Ru​​dy Palacois在这里,wget会可能会更好。那就是说,如果您想用PHP来做,curl会比file_get_contents快得多,原因有两个。

first off, i agree with @Rudy Palacois here, wget would probably be better. that said, if you want to do it in PHP, curl would be much faster than file_get_contents, for 2 reasons.

1:与file_get_contents不同,curl可以重用相同的连接下载多个文件,而file_get_contents将创建&关闭每次下载的新连接需要时间,因此卷曲会更快(只要您不使用CURLOPT_FORBID_REUSE / CURLOPT_FRESH_CONNECT,无论如何)

1: unlike file_get_contents, curl can reuse the same connection to download multiple files, while file_get_contents will create & close a new connection for each download, that takes time, thus curl will be faster (as long as you're not using CURLOPT_FORBID_REUSE / CURLOPT_FRESH_CONNECT , anyway)

2:curl当 Content-Length http标头的字节已下载时,停止下载。但是file_get_contents会完全忽略此标头,并继续下载所有可能的内容,直到关闭连接。这又可能比curl的方法慢得多,因为在某些服务器上,当连接关闭时,它取决于Web服务器,在某些服务器上,它比读取Content-Length字节要慢很多。

2: curl stops the download when the Content-Length http header's bytes has been downloaded. but file_get_contents completely ignores this header, and keeps downloading everything it can, until the connection is closed. this can again be much slower than curl's approach, because it's up to the web server when the connection will close, on some servers, it's A LOT slower than reading Content-Length bytes.

(通常,curl比file_get_contents更快,因为curl支持压缩传输,gzip和deflate,而file_get_contents则不支持...但是通常不适用于图像,大多数常见图像格式已经进行了预压缩。包括 .bmp 图片)

(and generally, curl is faster than file_get_contents because curl supports compressed transfers, gzip and deflate, which file_get_contents does not do... but that's generally not applicable for images, most common image formats are already pre-compressed. notable exceptions include .bmp images, though)

像这样:

$ch = curl_init ();
curl_setopt ( $ch, CURLOPT_ENCODING, '' ); // if you're downloading files that benefit from compression (like .bmp images), this line enables compressed transfers.
foreach ( $products as $product ) {

    $url = $product->img;
    $imgName = $product->product_id;
    $path = "images/";

    $img = $path . $imgName . ".png";
    $img=fopen($img,'wb');
    curl_setopt_array ( $ch, array (
            CURLOPT_URL => $url,
            CURLOPT_FILE => $img 
    ) );
    curl_exec ( $ch );
    fclose($img);
    // file_put_contents ( $img, file_get_contents ( $url ) );
}
curl_close ( $ch );

编辑:修复了代码打错的错字,称为 CURLOPT_FILE ,而不是 CURLOPT_OUTFILE

edit: fixed a code-breaking typo, it's called CURLOPT_FILE, not CURLOPT_OUTFILE

edit 2:CURLOPT_FILE想要文件资源,而不是文件路径,已解决该问题太xx

edit 2: CURLOPT_FILE wants a file resource, not a filepath, fixed that too x.x

这篇关于从网站批量下载图像的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆