使用 PHP 进行反向图像抓取 [英] Reverse image scraping with PHP
问题描述
我需要使用 API 不支持的谷歌反向图像搜索来获取一些图像,但幸运的是,您可以使用指向图像的直接链接查询谷歌,它仍然显示结果,因此:
$googleURL = "https://www.google.com/searchbyimage?&image_url=".$imageURL;回声 $googleURL;
输出:
<预> <代码> https://www.google.com.au/search?tbs=sbi:AMhZZiu9rNRW4ETWGjN9XYQKsa21UHM7j_1TjMjXvYyNH1knVTyMGZGNmS2yme4CsQb0T7UViTyNrG4e8u_1xLY-dZCU16wkfdUakeY7idDwyMge78nT--Grpll4t9_1fp4YPTsJyKRUANzw1Iyctsko7OZbkYES3VUHtyNy9l9RJf12YOdEvVOxSZCO6-JPxO0PpZ5p79Rr-eDUrqENWYVbk4qojafKMTVfuXvoACQ9iykI-DMVbP9n_1o0YkdKTdUeK2r30wg4Oe2BqspoXlI_11rxySuK6TolPM6z58E6erTT0bnYfXTlyDMBfOwgSfhbn2ipLrNHgNdqyk-YhmMP0_1ZzqVyZrgMz-I5cfH9N65nX6bhZfos0lgr8_15V6ZHtX0_1p8s5r229JDrwzlwnjwOBLgP1inmEORCaKOlcfHbyPnU3n04pIfLGu5fWYpbmFJwtK_1vaJvS0uFb6Pkh_1uv0wvz_10yf4O6E1IvBSoMudcYy4cmJ1zegJJ9L50C0bzXFIRUb62lcPJWbkZNR44Tz378nOSXd-PND0JfKQ-TujT3KfC_1O241knvr9Eb3LbuvncGiCMoPgxlUY4r9B_1KWchNWhJVTJz9omeiygwz5K_13YkjuLg52UF6YWvLedCxgRoUpuj9kFdmYt-b9Tn2VEZG8yfiLm3OTkZnlVYtPF87LLQAHH24VpLMoV0oDllHDK3xOXhvusl_1K2Me9tTdK15PPG7oreeWfYRztQwTpG4iB5GAnaj687OQukvxX5hNFIqXx_1QSuNooDhIP1eJl-6QYfuI4MPasj6flSMom7HYTSjyjcsQKw0Prj1bBsJY6qH1qyLrF1f1_1Ql0COERnbOV7O5mTOuTkNWarmR5wzE06qbgsrtT95ENqafd81ppHbA0Jyg-xQ8TLV-dSp1QDAtiYAHI_11tCwsDtrak4jDS4qAfEJCw_1lb9urJqqajvp25jLH2_1mN3u0eeW7xNF-PljofyhI0iIWYSg6ghyOVRIaT_1c6klKUPvOrquZy8hMCZWHb3CYZNGJeKTnACCyYW1MNVUsYnoFWORN6hvkVlUk0beFXvA_1W2vaoedLjj-fN1y8_1dPOiBROLYtv85nq01csCKk7Eib6p2b_131wEeQBYocoYU0sGTv2_1dhOvSXRPGTnrbZlNDbJFUtH4pF9tMQj5-Fh_1lw9TTXGCjQ9UjOSLD5q7tNjCQU1As1uCQBvmZvxo7J3gZSAcj_19wXfHZCOsA8g-WA97V-2b62ia4RFOehQ38hoXoK7MCSDLnVtJTsKQz9HuEreXm8qGQlbDzfr7JFuHHe2MOyChwnL_1gzRnZd8uv2OIM0nzKh_1wg4T1KCXv3NSGNkSyNxpYXFJ161Sv3NpQQI3epBMiYA_1AcQDiCxOTQvWj00e5EXaXN22CDRWRq3uk4HWj2eXcR6-TGmsYEfSGX9nyQwK1DHp9yaNjk9Bal7rNHUAe_1eMDsCWW9htaLyiMTio0eXyTumVrlt7ShZVd8oSPOj8U0ilY9owH95jz7LsI8vUnzF-FC2m_1yNt3xe4ZAcsRTbYQXTN3Ga76vTQBPu8oz0gkYmDTA&安培; gws_rd = CR安培; EI = wAHVVJOVLIeeugSZ64A4...现在在这个页面上,我需要点击链接到实际结果页面,所以我的情况看起来像:
if a.text == 'Large'elseif a.text == '中'elseif a.text == '视觉上相似的图像'{//抓取链接//获取前 10 个结果的直接链接}
但我不知道如何:
- 获取
href
如果条件a.text == 'Large'
满足,因为Simple HTML DOM Parser
或PHPQuery
也没有像 jQuery 那样的this
. - 在获取结果页面时,如何触发 mousedown 以获取全尺寸图像 URL,因为这是我在源代码中看到的:
jsaction="mousedown:irc.rl;keydown:irc.rlk"
这是我想要做的快速截屏:https://www.dropbox.com/s/c8g7fs5m5zqcegb/2015-02-07_08-56-23.mp4?dl=0 (5.9mb)
- 您可以使用正则表达式来查找匹配的链接.
- 如果您更仔细地检查 Google 图片搜索的 HTML 代码,您会发现链接中实际上还有一个 href 参数 (MygAMAL 只是另一个爬行.在那里您可以使用正则表达式再次解析出大图像.
I need to fetch some images using google Reverse image search which is not supported by the API, but thankfully, you can query google with a direct link to the image and it still shows results, so:
$googleURL = "https://www.google.com/searchbyimage?&image_url=".$imageURL;
echo $googleURL;
Output:
https://www.google.com.au/search?tbs=sbi:AMhZZiu9rNRW4ETWGjN9XYQKsa21UHM7j_1TjMjXvYyNH1knVTyMGZGNmS2yme4CsQb0T7UViTyNrG4e8u_1xLY-dZCU16wkfdUakeY7idDwyMge78nT--Grpll4t9_1fp4YPTsJyKRUANzw1Iyctsko7OZbkYES3VUHtyNy9l9RJf12YOdEvVOxSZCO6-JPxO0PpZ5p79Rr-eDUrqENWYVbk4qojafKMTVfuXvoACQ9iykI-DMVbP9n_1o0YkdKTdUeK2r30wg4Oe2BqspoXlI_11rxySuK6TolPM6z58E6erTT0bnYfXTlyDMBfOwgSfhbn2ipLrNHgNdqyk-YhmMP0_1ZzqVyZrgMz-I5cfH9N65nX6bhZfos0lgr8_15V6ZHtX0_1p8s5r229JDrwzlwnjwOBLgP1inmEORCaKOlcfHbyPnU3n04pIfLGu5fWYpbmFJwtK_1vaJvS0uFb6Pkh_1uv0wvz_10yf4O6E1IvBSoMudcYy4cmJ1zegJJ9L50C0bzXFIRUb62lcPJWbkZNR44Tz378nOSXd-PND0JfKQ-TujT3KfC_1O241knvr9Eb3LbuvncGiCMoPgxlUY4r9B_1KWchNWhJVTJz9omeiygwz5K_13YkjuLg52UF6YWvLedCxgRoUpuj9kFdmYt-b9Tn2VEZG8yfiLm3OTkZnlVYtPF87LLQAHH24VpLMoV0oDllHDK3xOXhvusl_1K2Me9tTdK15PPG7oreeWfYRztQwTpG4iB5GAnaj687OQukvxX5hNFIqXx_1QSuNooDhIP1eJl-6QYfuI4MPasj6flSMom7HYTSjyjcsQKw0Prj1bBsJY6qH1qyLrF1f1_1Ql0COERnbOV7O5mTOuTkNWarmR5wzE06qbgsrtT95ENqafd81ppHbA0Jyg-xQ8TLV-dSp1QDAtiYAHI_11tCwsDtrak4jDS4qAfEJCw_1lb9urJqqajvp25jLH2_1mN3u0eeW7xNF-PljofyhI0iIWYSg6ghyOVRIaT_1c6klKUPvOrquZy8hMCZWHb3CYZNGJeKTnACCyYW1MNVUsYnoFWORN6hvkVlUk0beFXvA_1W2vaoedLjj-fN1y8_1dPOiBROLYtv85nq01csCKk7Eib6p2b_131wEeQBYocoYU0sGTv2_1dhOvSXRPGTnrbZlNDbJFUtH4pF9tMQj5-Fh_1lw9TTXGCjQ9UjOSLD5q7tNjCQU1As1uCQBvmZvxo7J3gZSAcj_19wXfHZCOsA8g-WA97V-2b62ia4RFOehQ38hoXoK7MCSDLnVtJTsKQz9HuEreXm8qGQlbDzfr7JFuHHe2MOyChwnL_1gzRnZd8uv2OIM0nzKh_1wg4T1KCXv3NSGNkSyNxpYXFJ161Sv3NpQQI3epBMiYA_1AcQDiCxOTQvWj00e5EXaXN22CDRWRq3uk4HWj2eXcR6-TGmsYEfSGX9nyQwK1DHp9yaNjk9Bal7rNHUAe_1eMDsCWW9htaLyiMTio0eXyTumVrlt7ShZVd8oSPOj8U0ilY9owH95jz7LsI8vUnzF-FC2m_1yNt3xe4ZAcsRTbYQXTN3Ga76vTQBPu8oz0gkYmDTA&gws_rd=cr&ei=wAHVVJOVLIeeugSZ64A4
.. now on this page, I need to follow the link to the actual results page, so my condition would look like:
if a.text == 'Large'
elseif a.text == 'Medium'
elseif a.text == 'Visually similar images'{
// crawl the link
// get direct links of top 10 results
}
But I'm not sure how to:
- get the
href
if the conditiona.text == 'Large'
is met sinceSimple HTML DOM Parser
orPHPQuery
neither havethis
like jQuery. - On fetching the results page, how to trigger a mousedown even to get the full-size image URLS because this is what I see in the source:
jsaction="mousedown:irc.rl;keydown:irc.rlk"
Here's quick screencast of what I'm looking to do: https://www.dropbox.com/s/c8g7fs5m5zqcegb/2015-02-07_08-56-23.mp4?dl=0 (5.9mb)
- You can use Regular Expresssions to find the matching link.
- If you check the HTML Code of the Google Image Search a bit more closely, you see there is actually also a href param in the Link (Example) which you can follow with just another crawl. There you can parse out the large image again with Regular Expressions.
这篇关于使用 PHP 进行反向图像抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!