批量文件下载自动化的最佳方法 [英] Best method for automating bulk file download
问题描述
我正在尝试创建一个cron作业,下载存储在我们数据库中的队列中的图像文件。
I am attempting to create a cron job that downloads image files that are stored in a queue in our database.
我们使用的所有函数在Web服务器上运行时都正常工作,但是当我使用以下命令运行cron作业时: php index.php cron image_download
我收到一个分段错误
错误。
All of the functions that we are using work properly when run on our web server, however when I run the cron job using the following command: php index.php cron image_download
I receive a Segmentation Fault
error.
cron作业显示当数据传递到get_url_content函数时出现此错误,此函数在此处调用:
Debugging the cron job shows that this error occurs when the data is passed to the get_url_content function, which is called here:
foreach($urls as $url){
$content = $this->get_url_content($url);
}
功能如下:
function get_url_content($url){
$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL,$url);
return curl_exec($ch);
}
有更好的方法来下载这些文件吗?是否可能不同的方法不会导致相同的分段错误错误?谢谢!
Is there a better way to download these files? Is it likely that a different method would not cause the same segmentation fault error? Thank you!
更新:似乎我所尝试的各种方法是不断引起问题。我看到从cron作业返回的Segmentation Fault或Killed错误。有人建议我调查使用 Iron.io 这样,所以我要检查一下。
UPDATE: It appears that various methods I am trying are continually causing issues. I am seeing either "Segmentation Fault" or "Killed" errors returned from the cron job. Someone recommended that I look into using Iron.io for this so I am going to check that out. If anyone has other recommendations for how to manage this best I would appreciate additional information, thanks.
推荐答案
您可以尝试这种方法,
You can try this approach, but before that, are you giving it the full URL?
function get_content($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_AUTOREFERER, false);
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
curl_setopt($ch, CURLOPT_HEADER, 0);
$result = curl_exec($ch);
curl_close($ch);
return($result);
}
function save_content($text,$new_filename){
$fp = fopen($new_filename, 'w');
fwrite($fp, $text);
fclose($fp);
}
// replace this with your array of urls from the database (make sure it is an array)
$urls = ['http://domain.com/path/to/file.zip', 'http://another.com/path/to/image.img'];
foreach($urls as $url){
$new_filename = basename($url);
$temp = get_content($url);
save_content($temp,$new_filename);
}
这将通过其完整的URL获取文件内容并将其保存到磁盘,
This would get the file contents via its complete url and save it to disk, thus the being downloaded.
如果您不限于curl,您可以尝试:
If you are not limited to curl, you may try something like:
$urls = ['http://domain.com/path/to/file.zip', 'http://another.com/path/to/image.img'];
foreach($urls as $url){
$new_filename = basename($url);
// or fopen can be file_get_contents: file_get_contents($url)
file_put_contents($new_filename, fopen($url, 'r'));
}
或甚至
foreach($urls as $url){
$new_filename = basename($url);
shell_exec("wget $url -O $new_filename");
}
这篇关于批量文件下载自动化的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!