使用命令行从Google下载图像 [英] download images from google with command line

查看:96
本文介绍了使用命令行从Google下载图像的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想下载google通过命令行给我的第n个图像,例如使用命令wget

I would like to download n-th image that google gives me with command line i.e. like with command wget

要搜索[something]的图像,我只是转到页面https://www.google.cz/search?q=[something]&tbm=isch,但是如何获取第n个搜索结果的网址,以便可以使用wget?

To search image of [something] I just go to page https://www.google.cz/search?q=[something]&tbm=isch but how do I get url of n-th search result so I can use wget?

推荐答案

首次尝试

首先,您需要设置用户代理,以便Google授权搜索的输出.然后,我们可以查找图像并选择所需的图像.为了实现我们插入缺少的换行符的功能,wget将在一行上返回google搜索,并过滤链接.文件的索引存储在变量count中.

First you need to set the user agent so google will authorize output from searches. Then we can look for images and select the desired one. To accomplish that we insert missing newlines, wget will return google searches on one single line, and filter the link. The index of the file is stored in the variable count.

$ count=10
$ imagelink=$(wget --user-agent 'Mozilla/5.0' -qO - "www.google.be/search?q=something\&tbm=isch" | sed 's/</\n</g' | grep '<img' | head -n"$count" | tail -n1 | sed 's/.*src="\([^"]*\)".*/\1/')
$ wget $imagelink 

该图像现在将位于您的工作目录中,您可以调整最后一个命令并指定所需的输出文件名.

The image will now be in your working directory, you can tweak the last command and specify a desired output file name.

您可以在shell脚本中对其进行总结:

You can summarize it in a shell script:

#! /bin/bash
count=${1}
shift
query="$@"
[ -z $query ] && exit 1  # insufficient arguments
imagelink=$(wget --user-agent 'Mozilla/5.0' -qO - | "www.google.be/search?q=${query}\&tbm=isch" | sed 's/</\n</g' | grep '<img' | head -n"$count" | tail -n1 | sed 's/.*src="\([^"]*\)".*/\1/')
wget -qO google_image $imagelink

示例用法:

$ ls
Documents
Downloads
Music
script.sh
$ chmod +x script.sh
$ bash script.sh 5 awesome
$ ls
Documents
Downloads
google_image
Music
script.sh

现在,google_image在查找"awesome"时应包含第五个Google图片.如果您遇到任何错误,请告诉我,我会解决.

Now the google_image should contain the fifth google image when looking for 'awesome'. If you experience any bugs, let me know, I'll take care of them.

更好的代码

此代码的问题在于它以低分辨率返回图片.更好的解决方案如下:

The problem with this code is that it returns pictures in low resolution. A better solution is as follows:

#! /bin/bash

# function to create all dirs til file can be made
function mkdirs {
    file="$1"
    dir="/"

    # convert to full path
    if [ "${file##/*}" ]; then
        file="${PWD}/${file}"
    fi

    # dir name of following dir
    next="${file#/}"

    # while not filename
    while [ "${next//[^\/]/}" ]; do
        # create dir if doesn't exist
        [ -d "${dir}" ] || mkdir "${dir}"
        dir="${dir}/${next%%/*}"
        next="${next#*/}"
    done

    # last directory to make
    [ -d "${dir}" ] || mkdir "${dir}"
}

# get optional 'o' flag, this will open the image after download
getopts 'o' option
[[ $option = 'o' ]] && shift

# parse arguments
count=${1}
shift
query="$@"
[ -z "$query" ] && exit 1  # insufficient arguments

# set user agent, customize this by visiting http://whatsmyuseragent.com/
useragent='Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:31.0) Gecko/20100101 Firefox/31.0'

# construct google link
link="www.google.cz/search?q=${query}\&tbm=isch"

# fetch link for download
imagelink=$(wget -e robots=off --user-agent "$useragent" -qO - "$link" | sed 's/</\n</g' | grep '<a href.*\(png\|jpg\|jpeg\)' | sed 's/.*imgurl=\([^&]*\)\&.*/\1/' | head -n $count | tail -n1)
imagelink="${imagelink%\%*}"

# get file extention (.png, .jpg, .jpeg)
ext=$(echo $imagelink | sed "s/.*\(\.[^\.]*\)$/\1/")

# set default save location and file name change this!!
dir="$PWD"
file="google image"

# get optional second argument, which defines the file name or dir
if [[ $# -eq 2 ]]; then
    if [ -d "$2" ]; then
        dir="$2"
    else
        file="${2}"
        mkdirs "${dir}"
        dir=""
    fi
fi   

# construct image link: add 'echo "${google_image}"'
# after this line for debug output
google_image="${dir}/${file}"

# construct name, append number if file exists
if [[ -e "${google_image}${ext}" ]] ; then
    i=0
    while [[ -e "${google_image}(${i})${ext}" ]] ; do
        ((i++))
    done
    google_image="${google_image}(${i})${ext}"
else
    google_image="${google_image}${ext}"
fi

# get actual picture and store in google_image.$ext
wget --max-redirect 0 -qO "${google_image}" "${imagelink}"

# if 'o' flag supplied: open image
[[ $option = "o" ]] && gnome-open "${google_image}"

# successful execution, exit code 0
exit 0

注释应该是不言自明的,如果您对代码有任何疑问(例如较长的管道),我将很乐于澄清其机制.请注意,我必须在wget上设置一个更详细的用户代理,这可能会发生,您需要设置其他用户代理,但是我认为这不会成为问题.如果确实有问题,请访问 http://whatsmyuseragent.com/,并在useragent变量中提供输出.

The comments should be self explanatory, if you have any questions about the code (such as the long pipeline) I'll be happy to clarify the mechanics. Note that I had to set a more detailed user agent on the wget, it may happen that you need to set a different user agent but I don't think it'll be a problem. If you do have a problem, visit http://whatsmyuseragent.com/ and supply the output in the useragent variable.

当您要打开图像而不是仅下载图像时,请使用-o标志,如下例所示.如果您希望扩展脚本并包括自定义输出文件名,请告诉我,我会为您添加它.

When you wish to open the image instead of only downloading, use the -o flag, example below. If you wish to extend the script and also include a custom output file name, just let me know and I'll add it for you.

示例用法:

$ chmod +x getimg.sh
$ ./getimg.sh 1 dog
$ gnome-open google_image.jpg
$ ./getimg.sh -o 10 donkey

这篇关于使用命令行从Google下载图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆