cURL 返回空数组 [英] cURL returns null array

查看：31 发布时间：2021/9/22 20:30:21 php curl web-scraping web-crawler amazon

本文介绍了cURL 返回空数组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我用 PHP cURL 制作了一个简单的网络爬虫，它应该从亚马逊抓取特定页面的所有图像，其中搜索了关键字 samsung.

I have made a simple web Crawler with PHP cURL that should grab all the images of a particular page from Amazon where the keyword samsung has been searched.

代码如下:

$curl = curl_init(); // $curl is going to be data type curl resource

$search_string = "samsung";

$url = "https://www.amazon.com/s?k$search_string";

curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false); // ssl
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true); // storing in variable 

$result = curl_exec($curl);

preg_match_all("!https://m.media-amazon.com/images/I/[^\s]*?._AC_UL320_.jpg!", $result, $matches);

print_r($matches);

curl_close($curl);

但现在我得到空数组:

Array ( [0] => Array ( ) )

我不知道为什么会这样，所以如果你知道出了什么问题或者我该如何处理，请告诉我，我真的很感激你们的任何想法......

I don't why it is showing that, so if you know what is going wrong or how can I handle this, please let me know, I would really appreciate any idea from you guys...

提前致谢.

注意，我指定了 [^\s]*? 正则表达式而不是图像名称来加载网页上的所有可用图像.

Note that I have specified [^\s]*? regular expression instead of image name to load all the available images on web page.

更新 #1:

curl --head https://www.amazon.com/s?k=samsung

HTTP/1.1 503 Service Unavailable
Content-Type: text/html
Content-Length: 2671
Connection: keep-alive
Server: Server
Date: Tue, 15 Jun 2021 20:59:38 GMT
x-amz-rid: 9BVX8KQMWJ4QDJ75ETYV
Vary: Content-Type,Accept-Encoding,X-Amzn-CDN-Cache,X-Amzn-AX-Treatment,User-Agent
Last-Modified: Fri, 14 May 2021 19:08:48 GMT
ETag: "a6f-5c24ef9383000"
Accept-Ranges: bytes
Strict-Transport-Security: max-age=47474747; includeSubDomains; preload
Permissions-Policy: interest-cohort=()
X-Cache: Error from cloudfront
Via: 1.1 5345148f0ba8ae3c67b69d035acdbfc5.cloudfront.net (CloudFront)
X-Amz-Cf-Pop: AMS50-C1
X-Amz-Cf-Id: AHdq2-QLEtCE4WvXZIEh_P75D8hCrHP09EAkNqBer5VBS-pI-blj1w==

推荐答案

第一期:你的代码:

$url = "https://www.amazon.com/s?k$search_string";

应该是(注意=")

$url = "https://www.amazon.com/s?k=$search_string";

第二个问题:亚马逊很聪明，他们不会让你随心所欲地刮.结果是以下内容:

Second issue: Amazon is smart, they're not going to let you scrape as you will. The result is the content for:

您可以通过以下方式查看:

You can see this with:

$result = curl_exec($curl);
var_dump($result);

第三个问题:正则表达式不起作用.人们应该在 https://www.phpliveregex.com/#tab-preg 测试正则表达式-匹配所有(使用右键单击 > 查看源代码，复制并粘贴页面内容.)

Third issue: Regex is not working. One should test Regex at https://www.phpliveregex.com/#tab-preg-match-all (Using a right-click > view source, copy and paste of the page content.)

从我得到的你的正则表达式没有返回任何结果，但这样做了:https://m.media-amazon.com/images/I/[^\s]*?.jpg

From what I got your regex did not return any results, but this did: https://m.media-amazon.com/images/I/[^\s]*?.jpg

可能是字符串位 ._AC_UL320_ 也是亚马逊反抓取的东西... :(

May be that the string bit ._AC_UL320_ is also a Amazon anti-scraping thing... :(

这篇关于cURL 返回空数组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

cURL 返回空数组 [英] cURL returns null array

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

cURL 返回空数组 [英] cURL returns null array

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭