使用googleapi获得前10个谷歌搜索结果 [英] get the first 10 google results using googleapi

查看:126
本文介绍了使用googleapi获得前10个谷歌搜索结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如:

 <$> 

我需要获得前10个Google结果

c $ c> ... query = urllib.urlencode({'q':'example'})
...
... url ='http://ajax.googleapis.com/ ajax / services / search / web?v = 1.0&%s'\
...%(query)
... search_results = urllib.urlopen(url)
.. 。json = simplejson.loads(search_results.read())
... results = json ['responseData'] ['results']

这会给我第一页的结果,但我想要更多的谷歌结果,有谁知道如何做到这一点?

$ b $过去我已经完成了,这里有个完整的例子(我不是python guru,但它有效):

 #!/ usr / bin / env python 
# - * - coding:utf-8 - * -

import sys,getopt
import urllib
import simplejson

OPTIONS =(m:,[min =])

def print_usage( ):
s =usage:+ sys.argv [0] +
for o选项[0]:
如果o!=::s + =[ - + o +]
print(s +query_string\ n)

def search(query,index,offset,min_count,quiet = False,rs = []):
url =http://ajax.googleapis.com/ajax /services/search/web?v=1.0&rsz=large&%s&start=%s%(query,offset)
result = urllib.urlopen(url)
json = simplejson。加载(result.read())
status = json [responseStatus]
if status == 200:
results = json [responseData] [results]
cursor = json [responseData] [cursor]
pages = cursor [pages]
for result in:
i = results.index(r)+(index - 1)* len(results)+ 1
u = r [unes​​capedUrl]
rs.append(u)
如果不安静:
print(%3d。 %s%(i,u))
next_index = None
next_offset = None
对于页中的p:
如果p [label] == index:
i = pages.index(p)
if i< len(pages) - 1:
next_index = pages [i + 1] [label]
next_offset = pages [i +1] [start]
break
if next_index!= None和next_offset!= None:
if(next_offset)< min_count:
search(query,next_index ,next_offset,min_count,quiet,rs)
return rs

def main():
min_count = 64
try:
opts,args = getopt .getopt(sys.argv [1:],* OPTIONS)
for opt,arg in opts:
如果选择in(-m,--min):
min_count = int(arg)
assert len(args)> 0
除了:
print_usage()
sys.exit(1)
qs =.join (args)
query = urllib.u rlencode({q:qs})
search(query,1,0,min_count)

if __name__ ==__main__:
main()

编辑:我修复了显而易见的命令行选项错误处理;你可以调用这个脚本,如下所示:

  python gsearch.py​​ --min = 5 vanessa mae 

- min 开关意味着至少5个结果并且是可选的,如果未指定,您将获得允许的最大结果数(64)。



另外,为简洁起见,省略错误处理。


I need to get the first 10 google results

for example:

... query = urllib.urlencode({'q' : 'example'})
... 
... url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' \
... % (query)
... search_results = urllib.urlopen(url)
... json = simplejson.loads(search_results.read())
... results = json['responseData']['results']

this will give me the results of the first page, but I`d like to get more google results, do anyone know how to do that?

解决方案

I've done it in the past, here is complete example (i'm not python guru, but it works):

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys, getopt
import urllib
import simplejson

OPTIONS = ("m:", ["min="])

def print_usage():
    s = "usage: " + sys.argv[0] + " "
    for o in OPTIONS[0]:
        if o != ":" : s += "[-" + o + "] "
    print(s + "query_string\n")

def search(query, index, offset, min_count, quiet=False, rs=[]):
    url = "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&rsz=large&%s&start=%s" % (query, offset)
    result = urllib.urlopen(url)
    json = simplejson.loads(result.read())
    status = json["responseStatus"]
    if status == 200:
        results = json["responseData"]["results"]
        cursor = json["responseData"]["cursor"]
        pages = cursor["pages"]
        for r in results:
            i = results.index(r) + (index -1) * len(results) + 1
            u = r["unescapedUrl"]
            rs.append(u)
            if not quiet:
                print("%3d. %s" % (i, u))
        next_index  = None
        next_offset = None
        for p in pages:
            if p["label"] == index:
                i = pages.index(p)
                if i < len(pages) - 1:
                    next_index  = pages[i+1]["label"]
                    next_offset = pages[i+1]["start"]
                break
        if next_index != None and next_offset != None:
            if int(next_offset) < min_count:
                search(query, next_index, next_offset, min_count, quiet, rs)
    return rs

def main():
    min_count = 64
    try:
        opts, args = getopt.getopt(sys.argv[1:], *OPTIONS)
        for opt, arg in opts:
            if opt in ("-m", "--min"):
                min_count = int(arg)
        assert len(args) > 0
    except:
        print_usage()
        sys.exit(1)
    qs = " ".join(args)
    query = urllib.urlencode({"q" : qs})
    search(query, 1, "0", min_count)

if __name__ == "__main__":
    main()

Edit: i've fixed obvious command-line options mishandling; you can call this script as follows:

python gsearch.py --min=5 vanessa mae

--min switch means "at least 5 results" and is optional, you will get maximum allowed result count (64) if it is not specified.

Also, error handling is omitted for brevity.

这篇关于使用googleapi获得前10个谷歌搜索结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆