使用 PHP 进行反向图像抓取 [英] Reverse image scraping with PHP

查看:21
本文介绍了使用 PHP 进行反向图像抓取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用 API 不支持的谷歌反向图像搜索来获取一些图像,但幸运的是,您可以使用指向图像的直接链接查询谷歌,它仍然显示结果,因此:

$googleURL = "https://www.google.com/searchbyimage?&image_url=".$imageURL;回声 $googleURL;

输出:

<预> <代码> https://www.google.com.au/search?tbs=sbi:AMhZZiu9rNRW4ETWGjN9XYQKsa21UHM7j_1TjMjXvYyNH1knVTyMGZGNmS2yme4CsQb0T7UViTyNrG4e8u_1xLY-dZCU16wkfdUakeY7idDwyMge78nT--Grpll4t9_1fp4YPTsJyKRUANzw1Iyctsko7OZbkYES3VUHtyNy9l9RJf12YOdEvVOxSZCO6-JPxO0PpZ5p79Rr-eDUrqENWYVbk4qojafKMTVfuXvoACQ9iykI-DMVbP9n_1o0YkdKTdUeK2r30wg4Oe2BqspoXlI_11rxySuK6TolPM6z58E6erTT0bnYfXTlyDMBfOwgSfhbn2ipLrNHgNdqyk-YhmMP0_1ZzqVyZrgMz-I5cfH9N65nX6bhZfos0lgr8_15V6ZHtX0_1p8s5r229JDrwzlwnjwOBLgP1inmEORCaKOlcfHbyPnU3n04pIfLGu5fWYpbmFJwtK_1vaJvS0uFb6Pkh_1uv0wvz_10yf4O6E1IvBSoMudcYy4cmJ1zegJJ9L50C0bzXFIRUb62lcPJWbkZNR44Tz378nOSXd-PND0JfKQ-TujT3KfC_1O241knvr9Eb3LbuvncGiCMoPgxlUY4r9B_1KWchNWhJVTJz9omeiygwz5K_13YkjuLg52UF6YWvLedCxgRoUpuj9kFdmYt-b9Tn2VEZG8yfiLm3OTkZnlVYtPF87LLQAHH24VpLMoV0oDllHDK3xOXhvusl_1K2Me9tTdK15PPG7oreeWfYRztQwTpG4iB5GAnaj687OQukvxX5hNFIqXx_1QSuNooDhIP1eJl-6QYfuI4MPasj6flSMom7HYTSjyjcsQKw0Prj1bBsJY6qH1qyLrF1f1_1Ql0COERnbOV7O5mTOuTkNWarmR5wzE06qbgsrtT95ENqafd81ppHbA0Jyg-xQ8TLV-dSp1QDAtiYAHI_11tCwsDtrak4jDS4qAfEJCw_1lb9urJqqajvp25jLH2_1mN3u0eeW7xNF-PljofyhI0iIWYSg6ghyOVRIaT_1c6klKUPvOrquZy8hMCZWHb3CYZNGJeKTnACCyYW1MNVUsYnoFWORN6hvkVlUk0beFXvA_1W2vaoedLjj-fN1y8_1dPOiBROLYtv85nq01csCKk7Eib6p2b_131wEeQBYocoYU0sGTv2_1dhOvSXRPGTnrbZlNDbJFUtH4pF9tMQj5-Fh_1lw9TTXGCjQ9UjOSLD5q7tNjCQU1As1uCQBvmZvxo7J3gZSAcj_19wXfHZCOsA8g-WA97V-2b62ia4RFOehQ38hoXoK7MCSDLnVtJTsKQz9HuEreXm8qGQlbDzfr7JFuHHe2MOyChwnL_1gzRnZd8uv2OIM0nzKh_1wg4T1KCXv3NSGNkSyNxpYXFJ161Sv3NpQQI3epBMiYA_1AcQDiCxOTQvWj00e5EXaXN22CDRWRq3uk4HWj2eXcR6-TGmsYEfSGX9nyQwK1DHp9yaNjk9Bal7rNHUAe_1eMDsCWW9htaLyiMTio0eXyTumVrlt7ShZVd8oSPOj8U0ilY9owH95jz7LsI8vUnzF-FC2m_1yNt3xe4ZAcsRTbYQXTN3Ga76vTQBPu8oz0gkYmDTA&安培; gws_rd = CR安培; EI = wAHVVJOVLIeeugSZ64A4

...现在在这个页面上,我需要点击链接到实际结果页面,所以我的情况看起来像:

if a.text == 'Large'elseif a.text == '中'elseif a.text == '视觉上相似的图像'{//抓取链接//获取前 10 个结果的直接链接}

但我不知道如何:

  1. 获取 href 如果条件 a.text == 'Large' 满足,因为 Simple HTML DOM ParserPHPQuery 也没有像 jQuery 那样的 this.
  2. 在获取结果页面时,如何触发 mousedown 以获取全尺寸图像 URL,因为这是我在源代码中看到的:jsaction="mousedown:irc.rl;keydown:irc.rlk"

这是我想要做的快速截屏:https://www.dropbox.com/s/c8g7fs5m5zqcegb/2015-02-07_08-56-23.mp4?dl=0 (5.9mb)

解决方案

  1. 您可以使用正则表达式来查找匹配的链接.
  2. 如果您更仔细地检查 Google 图片搜索的 HTML 代码,您会发现链接中实际上还有一个 href 参数 (MygAMAL 只是另一个爬行.在那里您可以使用正则表达式再次解析出大图像.

I need to fetch some images using google Reverse image search which is not supported by the API, but thankfully, you can query google with a direct link to the image and it still shows results, so:

$googleURL = "https://www.google.com/searchbyimage?&image_url=".$imageURL;
echo $googleURL;

Output:

https://www.google.com.au/search?tbs=sbi:AMhZZiu9rNRW4ETWGjN9XYQKsa21UHM7j_1TjMjXvYyNH1knVTyMGZGNmS2yme4CsQb0T7UViTyNrG4e8u_1xLY-dZCU16wkfdUakeY7idDwyMge78nT--Grpll4t9_1fp4YPTsJyKRUANzw1Iyctsko7OZbkYES3VUHtyNy9l9RJf12YOdEvVOxSZCO6-JPxO0PpZ5p79Rr-eDUrqENWYVbk4qojafKMTVfuXvoACQ9iykI-DMVbP9n_1o0YkdKTdUeK2r30wg4Oe2BqspoXlI_11rxySuK6TolPM6z58E6erTT0bnYfXTlyDMBfOwgSfhbn2ipLrNHgNdqyk-YhmMP0_1ZzqVyZrgMz-I5cfH9N65nX6bhZfos0lgr8_15V6ZHtX0_1p8s5r229JDrwzlwnjwOBLgP1inmEORCaKOlcfHbyPnU3n04pIfLGu5fWYpbmFJwtK_1vaJvS0uFb6Pkh_1uv0wvz_10yf4O6E1IvBSoMudcYy4cmJ1zegJJ9L50C0bzXFIRUb62lcPJWbkZNR44Tz378nOSXd-PND0JfKQ-TujT3KfC_1O241knvr9Eb3LbuvncGiCMoPgxlUY4r9B_1KWchNWhJVTJz9omeiygwz5K_13YkjuLg52UF6YWvLedCxgRoUpuj9kFdmYt-b9Tn2VEZG8yfiLm3OTkZnlVYtPF87LLQAHH24VpLMoV0oDllHDK3xOXhvusl_1K2Me9tTdK15PPG7oreeWfYRztQwTpG4iB5GAnaj687OQukvxX5hNFIqXx_1QSuNooDhIP1eJl-6QYfuI4MPasj6flSMom7HYTSjyjcsQKw0Prj1bBsJY6qH1qyLrF1f1_1Ql0COERnbOV7O5mTOuTkNWarmR5wzE06qbgsrtT95ENqafd81ppHbA0Jyg-xQ8TLV-dSp1QDAtiYAHI_11tCwsDtrak4jDS4qAfEJCw_1lb9urJqqajvp25jLH2_1mN3u0eeW7xNF-PljofyhI0iIWYSg6ghyOVRIaT_1c6klKUPvOrquZy8hMCZWHb3CYZNGJeKTnACCyYW1MNVUsYnoFWORN6hvkVlUk0beFXvA_1W2vaoedLjj-fN1y8_1dPOiBROLYtv85nq01csCKk7Eib6p2b_131wEeQBYocoYU0sGTv2_1dhOvSXRPGTnrbZlNDbJFUtH4pF9tMQj5-Fh_1lw9TTXGCjQ9UjOSLD5q7tNjCQU1As1uCQBvmZvxo7J3gZSAcj_19wXfHZCOsA8g-WA97V-2b62ia4RFOehQ38hoXoK7MCSDLnVtJTsKQz9HuEreXm8qGQlbDzfr7JFuHHe2MOyChwnL_1gzRnZd8uv2OIM0nzKh_1wg4T1KCXv3NSGNkSyNxpYXFJ161Sv3NpQQI3epBMiYA_1AcQDiCxOTQvWj00e5EXaXN22CDRWRq3uk4HWj2eXcR6-TGmsYEfSGX9nyQwK1DHp9yaNjk9Bal7rNHUAe_1eMDsCWW9htaLyiMTio0eXyTumVrlt7ShZVd8oSPOj8U0ilY9owH95jz7LsI8vUnzF-FC2m_1yNt3xe4ZAcsRTbYQXTN3Ga76vTQBPu8oz0gkYmDTA&gws_rd=cr&ei=wAHVVJOVLIeeugSZ64A4

.. now on this page, I need to follow the link to the actual results page, so my condition would look like:

if a.text == 'Large' 
elseif a.text == 'Medium'
elseif a.text == 'Visually similar images'{
    // crawl the link
    // get direct links of top 10 results  
}

But I'm not sure how to:

  1. get the href if the condition a.text == 'Large' is met since Simple HTML DOM Parser or PHPQuery neither have this like jQuery.
  2. On fetching the results page, how to trigger a mousedown even to get the full-size image URLS because this is what I see in the source: jsaction="mousedown:irc.rl;keydown:irc.rlk"

Here's quick screencast of what I'm looking to do: https://www.dropbox.com/s/c8g7fs5m5zqcegb/2015-02-07_08-56-23.mp4?dl=0 (5.9mb)

解决方案

  1. You can use Regular Expresssions to find the matching link.
  2. If you check the HTML Code of the Google Image Search a bit more closely, you see there is actually also a href param in the Link (Example) which you can follow with just another crawl. There you can parse out the large image again with Regular Expressions.

这篇关于使用 PHP 进行反向图像抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆