脚本使用谷歌图片搜索与本地图像作为输入 [英] Script to use Google Image Search with local image as input

查看:321
本文介绍了脚本使用谷歌图片搜索与本地图像作为输入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在寻找一个批处理或PowerShell脚本来搜索使用本地图像作为输入在谷歌的图像相似的图像。

I'm looking for a batch or Powershell script to search for similar images on Google images using a local image as input.

我的研究,到目前为止

有关使用URL而不是本地文件中的图像搜索的语法followes:结果
<一个href=\"https://www.google.com/searchbyimage?image_url=TEST\">https://www.google.com/searchbyimage?image_url=TEST

在这里测试可与你有任何图像的URL来代替。

The syntax for a image search using a URL rather than a local file is as followes:
https://www.google.com/searchbyimage?image_url=TEST
where TEST can be replaced with any image URL you have.

我打了卷曲窗户和的 imgur 临时图像保护。
我能上传文件通过一批imgur。然后,将图像URL被用来在谷歌搜索类似的图片。

I played with cURL for windows and imgur as temporary image saver. I was able to upload a file to imgur via batch. The image URL was then used to search similar images on Google.

但我不知道是否有可能不使用任何临时缓存像imgur或任何其他在线图片服务。只是一个批次,卷曲,谷歌和我。

But I wonder if it is possible without using any temporary cache like imgur or any other online picture service. Just a batch, curl, Google and me.

只是一个想法。是一个VBS脚本也许能够使用本地文件作为输入在谷歌图片搜索?结果
或者像 Tineye类似的网络服务的更适合这项任务?

Just a thought. Is a VBS script maybe capable to search on Google Images with a local file as input?
Or are similar web services like Tineye better suited for that task?

这PowerShell的片段会打开Goog​​le的图片搜索。

This powershell snippet will open Googles Image Search.

$IE= new-object -com InternetExplorer.Application
$IE.navigate2("https://www.google.com/imghp?hl=en")
while ($IE.busy) {
sleep -milliseconds 50
}
$IE.visible=$true

接下来的步骤会得到一些按钮的ID,然后点击它们编程来选择本地文件。但在这里我没有足够的经验来实现这一目标。

The next steps would be to get the IDs of some buttons and click them programmatically to select the local file. But here I'm not experienced enough to achieve this.

推荐答案

酷的问题!我花了太多的时间与此修修补补,但我认为终于得到它:)

Cool question! I spent far too much time tinkering with this, but I think finally got it :)

在简单地说,你要上传图片,嵌入式和一些其他的东西沿着正确的格式,以 images.google.com/searchbyimage/upload 。这一请求的响应将包含向您发送的实际结果页新的URL。

In a nutshell, you have to upload the raw bytes of your image, embedded and properly formatted along with some other stuff, to images.google.com/searchbyimage/upload. The response to that request will contain a new URL which sends you to the actual results page.

此函数将返回结果页面的URL。你可以做任何你想做的事情,而是只需在浏览器中打开的结果,它传递给开始处理

This function will return back the results page URL. You can do whatever you want with it, but to simply open the results in a browser, pass it to Start-Process.

当然,谷歌可以在任何时候改变工作流程对于这一点,所以不要指望这个剧本永远工作。

Of course, Google could change the workflow for this at any time, so don't expect this script to work forever.

function Get-GoogleImageSearchUrl
{
    param(
        [Parameter(Mandatory = $true)]
        [ValidateScript({ Test-Path $_ })]
        [string] $ImagePath
    )

    # extract the image file name, without path
    $fileName = Split-Path $imagePath -Leaf

    # the request body has some boilerplate before the raw image bytes (part1) and some after (part2)
    #   note that $filename is included in part1
    $part1 = @"
-----------------------------7dd2db3297c2202
Content-Disposition: form-data; name="encoded_image"; filename="$fileName"
Content-Type: image/jpeg


"@
    $part2 = @"
-----------------------------7dd2db3297c2202
Content-Disposition: form-data; name="image_content"


-----------------------------7dd2db3297c2202--

"@

    # grab the raw bytes composing the image file
    $imageBytes = [Io.File]::ReadAllBytes($imagePath)

    # the request body should sandwich the image bytes between the 2 boilerplate blocks
    $encoding = New-Object Text.ASCIIEncoding
    $data = $encoding.GetBytes($part1) + $imageBytes + $encoding.GetBytes($part2)

    # create the HTTP request, populate headers
    $request = [Net.HttpWebRequest] ([Net.HttpWebRequest]::Create('http://images.google.com/searchbyimage/upload'))
    $request.Method = "POST"
    $request.ContentType = 'multipart/form-data; boundary=---------------------------7dd2db3297c2202'  # must match the delimiter in the body, above
    $request.ContentLength = $data.Length

    # don't automatically redirect to the results page, just take the response which points to it
    $request.AllowAutoredirect = $false

    # populate the request body
    $stream = $request.GetRequestStream()
    $stream.Write($data, 0, $data.Length)
    $stream.Close()        

    # get response stream, which should contain a 302 redirect to the results page
    $respStream = $request.GetResponse().GetResponseStream()

    # pluck out the results page link that you would otherwise be redirected to
    (New-Object Io.StreamReader $respStream).ReadToEnd() -match 'HREF\="([^"]+)"' | Out-Null
    $matches[1]
}

用法:

$url = Get-GoogleImageSearchUrl 'C:\somepic.jpg'
Start-Process $url

编辑/解释

下面是一些细节。我基本上只是带你通过我采取的步骤,因为我想通了这一点。

Edit/Explanation

Here's some more detail. I'll basically just take you through the steps I took as I figured this out.

首先,我只是一往直前,做了局部图像搜索。

First, I just went ahead and did a local image search.

将其发送给你的网址很长(〜在longcat的情况下,1500个字符),但几乎没有足够长的时间,以充分EN code图像(60KB)。所以,你可以告诉了蝙蝠的权利,它是不是简单地做一些像一个base64编码更为复杂。

The URL it sends you to is very long (~1500 chars in the case of longcat), but not nearly long enough to fully encode the image (60KB). So you can tell right off the bat that it's more complex than simply doing something like a base64 encoding.

接下来,我发射了小提琴手,看着什么实际发生,当你做一个本地图片搜索上。浏览/选择图像后,你看到一些流量 images.google.com/searchbyimage/upload 。查看详细这项要求揭示的基本机制。

Next, I fired up Fiddler and looked at what's actually going on when you do a local image search. After browsing/selecting the image, you see some traffic to images.google.com/searchbyimage/upload. Viewing that request in detail reveals the basic mechanism.


  1. 数据正在以的multipart / form-data的的格式发送,你需要指定哪些字符串中的字符是区分不同领域(红色框)。如果冰/谷歌周围,你会发现,的multipart / form-data的是某种形式的网络标准,但它其实并不重要,在这个例子。

  2. 您需要(或者至少应该)包含原文件名(橙色框)。也许这因素纳入搜索结果中。

  3. 完整,原始图像包含在连接codeD-图片字段(绿框)。

  4. 不包含实际结果的反应,这是一个简单的重定向到实际结果页面(紫色框)

  1. The data is being sent in the format of multipart/form-data, and you need to specify what string of characters is separating the different fields (red boxes). If you Bing/Google around, you will find that multipart/form-data is some kind of web standard, but it really doesn't matter for this example.
  2. You need to (or at least should) include the original file name (orange box). Perhaps this factors into the search results.
  3. The full, raw image is included in the encoded-image field (green box).
  4. The response does not contain the actual results, it is simply a redirect to the actual results page (purple boxes)

有几个字段这里没有显示,这样在底部。他们不是超级有趣。

There are a few fields not shown here, way at the bottom. They aren't super interesting.

一旦我想通了基本的工作流程,这不仅是编码它的问题。我刚才复制我看到提琴手尽可能接近我可以在Web请求,使用标准的.NET Web请求的API。答案这太问题证明你所需要的的API,以便正确地连接code,并在发送主体数据Web请求。

Once I figured out the basic workflow, it was only a matter of coding it up. I just copied the web request I saw in Fiddler as closely as I could, using standard .NET web request APIs. The answers to this SO question demonstrate the APIs you need in order to properly encode and send body data in a web request.

从一些实验,我发现,你只需要在两个身体的字段我包括在我的code(连接coded_image image_content )。通过Web UI包括更多的去,但显然他们不是必需的。

From some experimentation, I found that you only need the two body fields I included in my code (encoded_image and image_content). Going through the web UI includes more, but apparently they are not required.

更多的实验表明,真正需要没有其他的标题或饼干的提琴手所示。

More experimentation revealed that none of the other headers or cookies shown in Fiddler are really required.

对于我们而言,我们实际上并不希望访问结果页面中,只得到一个指针。因此,我们应该设置 AllowAutoRedirect $假。这样一来,谷歌的302重定向是直接提供给我们,我们可以从中提取出结果页面的网址。

For our purposes, we don't actually want to access the results page, only get a pointer to it. Thus we should set AllowAutoRedirect to $false. That way, Google's 302 redirect is given to us directly and we can extract the results page URL from it.

在写此编辑,我拍了拍我的额头,并意识到PowerShell的V3具有调用-的WebRequest cmdlet,这将有可能消除对.NET的Web API调用的需要。不幸的是,我无法得到它的修修补补10分钟后正常工作,所以我放弃了。好像有些问题与cmdlet时编码数据,虽然我可能是错的方式。

While writing this edit, I slapped my forehead and realized that Powershell v3 has the Invoke-WebRequest cmdlet, which could potentially eliminate the need for the .NET web API calls. Unfortunately, I could not get it to work properly after tinkering for 10 min, so I gave up. Seems like some issue with the way the cmdlet is encoding the data, though I could be wrong.

这篇关于脚本使用谷歌图片搜索与本地图像作为输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆