如何抓取 SSL 或 HTTPS URL [英] How to scrape a SSL or HTTPS URL

查看：26 发布时间：2021/12/17 13:55:35 php curl web-scraping

本文介绍了如何抓取 SSL 或 HTTPS URL的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我编写了一个使用 CURL 抓取网站的函数，但在调用时它什么也不返回，并且不明白为什么.输出为空

I have written a function to scrape a website using CURL but it returns nothing when called and can't understand why. The output is empty

  <?php
    function scrape($url)
    {
        $headers = Array(
                    "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5",
                    "Cache-Control: max-age=0",
                    "Connection: keep-alive",
                    "Keep-Alive: 300",
                    "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7",
                    "Accept-Language: en-us,en;q=0.5",
                    "Pragma: "
                );
        $config = Array(
                        CURLOPT_RETURNTRANSFER => TRUE ,
                        CURLOPT_FOLLOWLOCATION => TRUE ,
                        CURLOPT_AUTOREFERER => TRUE ,
                        CURLOPT_CONNECTTIMEOUT => 120 ,
                        CURLOPT_TIMEOUT => 120 ,
                        CURLOPT_MAXREDIRS => 10 ,                   
                        CURLOPT_USERAGENT => "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073000 Shredder/3.0a2pre ThunderBrowse/3.2.1.8" ,
                        CURLOPT_URL => $url ,
                       ) ;
        $handle = curl_init() ;
        curl_setopt_array($handle,$config) ;
        curl_setopt($handle,CURLOPT_HTTPHEADER,$headers) ;
        $data = curl_exec($handle) ;
        curl_close($handle) ;
        return $data ;
    }

    echo scrape("https://www.google.com") ;
?>

推荐答案

尝试抓取 ssl 或 https url 时有 2 个可能的修复:

There are 2 possible fixes when trying to scrape a ssl or https url:

快速修复
正确的修复

首先是快速修复.

警告:这可能会引入 SSL 旨在防止的安全问题.

设置:CURLOPT_SSL_VERIFYPEER =>假

第二个，也是正确的修复.设置 3 个选项:

The second, and proper fix. Set 3 options:

CURLOPT_SSL_VERIFYPEER =>真的
CURLOPT_SSL_VERIFYHOST =>2
CURLOPT_CAINFO =>getcwd() .'CAcert.pem'

您需要做的最后一件事是下载 CA 证书.

The last thing you need to do is download the CA certificate.

转到，-http://curl.haxx.se/docs/caextract.html -> 单击cacert.pem" -> 将文本复制/粘贴到文本编辑器中 -> 将文件另存为CAcert.pem" 检查它不是CAcert.pem.txt'

Go to, - http://curl.haxx.se/docs/caextract.html -> click 'cacert.pem' -> copie/paste the text in to a text editor -> save the file as 'CAcert.pem' Check it isn't 'CAcert.pem.txt'

<?php function scrape($url) { $headers = Array( "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5", "Cache-Control: max-age=0", "Connection: keep-alive", "Keep-Alive: 300", "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7", "Accept-Language: en-us,en;q=0.5", "Pragma: " ); $config = Array( CURLOPT_SSL_VERIFYPEER => true, CURLOPT_SSL_VERIFYHOST => 2, CURLOPT_CAINFO => getcwd() . 'CAcert.pem', CURLOPT_RETURNTRANSFER => TRUE , CURLOPT_FOLLOWLOCATION => TRUE , CURLOPT_AUTOREFERER => TRUE , CURLOPT_CONNECTTIMEOUT => 120 , CURLOPT_TIMEOUT => 120 , CURLOPT_MAXREDIRS => 10 , CURLOPT_USERAGENT => "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073000 Shredder/3.0a2pre ThunderBrowse/3.2.1.8" , CURLOPT_URL => $url ) ; $handle = curl_init() ; curl_setopt_array($handle,$config) ; curl_setopt($handle,CURLOPT_HTTPHEADER,$headers) ; $output->data = curl_exec($handle) ; if(curl_exec($handle) === false) { $output->error = 'Curl error: ' . curl_error($handle); } else { $output->error = 'Operation completed without any errors'; } curl_close($handle) ; return $output ; } $scrape = scrape("https://www.google.com") ; echo $scrape->data; //uncomment for errors //echo $scrape->error; ?>

这篇关于如何抓取 SSL 或 HTTPS URL的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何抓取 SSL 或 HTTPS URL [英] How to scrape a SSL or HTTPS URL

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

如何抓取 SSL 或 HTTPS URL [英] How to scrape a SSL or HTTPS URL

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭