以本地图片为输入,使用Google图片搜索的脚本 [英] Script to use Google Image Search with local image as input

查看:1234
本文介绍了以本地图片为输入,使用Google图片搜索的脚本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在寻找一个批处理或Powershell脚本,使用本地图片作为输入在Google图片上搜索类似的图片。





我的研究到目前为止 / p>

使用URL而不是本地文件进行图像搜索的语法如下:

的答案演示了您在网络请求中正确编码和发送正文数据所需的API。 / p>

从一些实验,我发现你只需要我包含在我的代码( encoded_image code> image_content )。通过网络用户界面包括更多,但显然他们不是必需的。



更多的实验表明,没有其他标题或饼干显示在Fiddler真的需要。 p>

为了我们的目的,我们实际上不想访问结果页面,只获取一个指针。因此,我们应该将 AllowAutoRedirect 设置为 $ false 。这样,Google的302重定向会直接提供给我们,我们可以从中提取结果页面网址。



在编辑这个编辑时,我拍了我的额头,意识到Powershell v3具有 Invoke-WebRequest cmdlet,这可能会消除对.NET Web API调用的需要。不幸的是,我不能让它工作正常后修整10分钟,所以我放弃了。看起来像一些问题的cmdlet编码数据的方式,虽然我可能是错误的。


I'm looking for a batch or Powershell script to search for similar images on Google images using a local image as input.

My research so far

The syntax for a image search using a URL rather than a local file is as followes:
https://www.google.com/searchbyimage?image_url=TEST
where TEST can be replaced with any image URL you have.

I played with cURL for windows and imgur as temporary image saver. I was able to upload a file to imgur via batch. The image URL was then used to search similar images on Google.

But I wonder if it is possible without using any temporary cache like imgur or any other online picture service. Just a batch, curl, Google and me.

Just a thought. Is a VBS script maybe capable to search on Google Images with a local file as input?
Or are similar web services like Tineye better suited for that task?


This powershell snippet will open Googles Image Search.

$IE= new-object -com InternetExplorer.Application
$IE.navigate2("https://www.google.com/imghp?hl=en")
while ($IE.busy) {
sleep -milliseconds 50
}
$IE.visible=$true

The next steps would be to get the IDs of some buttons and click them programmatically to select the local file. But here I'm not experienced enough to achieve this.

解决方案

Cool question! I spent far too much time tinkering with this, but I think finally got it :)

In a nutshell, you have to upload the raw bytes of your image, embedded and properly formatted along with some other stuff, to images.google.com/searchbyimage/upload. The response to that request will contain a new URL which sends you to the actual results page.

This function will return back the results page URL. You can do whatever you want with it, but to simply open the results in a browser, pass it to Start-Process.

Of course, Google could change the workflow for this at any time, so don't expect this script to work forever.

function Get-GoogleImageSearchUrl
{
    param(
        [Parameter(Mandatory = $true)]
        [ValidateScript({ Test-Path $_ })]
        [string] $ImagePath
    )

    # extract the image file name, without path
    $fileName = Split-Path $imagePath -Leaf

    # the request body has some boilerplate before the raw image bytes (part1) and some after (part2)
    #   note that $filename is included in part1
    $part1 = @"
-----------------------------7dd2db3297c2202
Content-Disposition: form-data; name="encoded_image"; filename="$fileName"
Content-Type: image/jpeg


"@
    $part2 = @"
-----------------------------7dd2db3297c2202
Content-Disposition: form-data; name="image_content"


-----------------------------7dd2db3297c2202--

"@

    # grab the raw bytes composing the image file
    $imageBytes = [Io.File]::ReadAllBytes($imagePath)

    # the request body should sandwich the image bytes between the 2 boilerplate blocks
    $encoding = New-Object Text.ASCIIEncoding
    $data = $encoding.GetBytes($part1) + $imageBytes + $encoding.GetBytes($part2)

    # create the HTTP request, populate headers
    $request = [Net.HttpWebRequest] ([Net.HttpWebRequest]::Create('http://images.google.com/searchbyimage/upload'))
    $request.Method = "POST"
    $request.ContentType = 'multipart/form-data; boundary=---------------------------7dd2db3297c2202'  # must match the delimiter in the body, above
    $request.ContentLength = $data.Length

    # don't automatically redirect to the results page, just take the response which points to it
    $request.AllowAutoredirect = $false

    # populate the request body
    $stream = $request.GetRequestStream()
    $stream.Write($data, 0, $data.Length)
    $stream.Close()        

    # get response stream, which should contain a 302 redirect to the results page
    $respStream = $request.GetResponse().GetResponseStream()

    # pluck out the results page link that you would otherwise be redirected to
    (New-Object Io.StreamReader $respStream).ReadToEnd() -match 'HREF\="([^"]+)"' | Out-Null
    $matches[1]
}

Usage:

$url = Get-GoogleImageSearchUrl 'C:\somepic.jpg'
Start-Process $url

Edit/Explanation

Here's some more detail. I'll basically just take you through the steps I took as I figured this out.

First, I just went ahead and did a local image search.

The URL it sends you to is very long (~1500 chars in the case of longcat), but not nearly long enough to fully encode the image (60KB). So you can tell right off the bat that it's more complex than simply doing something like a base64 encoding.

Next, I fired up Fiddler and looked at what's actually going on when you do a local image search. After browsing/selecting the image, you see some traffic to images.google.com/searchbyimage/upload. Viewing that request in detail reveals the basic mechanism.

  1. The data is being sent in the format of multipart/form-data, and you need to specify what string of characters is separating the different fields (red boxes). If you Bing/Google around, you will find that multipart/form-data is some kind of web standard, but it really doesn't matter for this example.
  2. You need to (or at least should) include the original file name (orange box). Perhaps this factors into the search results.
  3. The full, raw image is included in the encoded-image field (green box).
  4. The response does not contain the actual results, it is simply a redirect to the actual results page (purple boxes)

There are a few fields not shown here, way at the bottom. They aren't super interesting.

Once I figured out the basic workflow, it was only a matter of coding it up. I just copied the web request I saw in Fiddler as closely as I could, using standard .NET web request APIs. The answers to this SO question demonstrate the APIs you need in order to properly encode and send body data in a web request.

From some experimentation, I found that you only need the two body fields I included in my code (encoded_image and image_content). Going through the web UI includes more, but apparently they are not required.

More experimentation revealed that none of the other headers or cookies shown in Fiddler are really required.

For our purposes, we don't actually want to access the results page, only get a pointer to it. Thus we should set AllowAutoRedirect to $false. That way, Google's 302 redirect is given to us directly and we can extract the results page URL from it.

While writing this edit, I slapped my forehead and realized that Powershell v3 has the Invoke-WebRequest cmdlet, which could potentially eliminate the need for the .NET web API calls. Unfortunately, I could not get it to work properly after tinkering for 10 min, so I gave up. Seems like some issue with the way the cmdlet is encoding the data, though I could be wrong.

这篇关于以本地图片为输入,使用Google图片搜索的脚本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆