使用googleapi获得前10个谷歌搜索结果 [英] get the first 10 google results using googleapi
问题描述
<$>我需要获得前10个Google结果
c $ c> ... query = urllib.urlencode({'q':'example'})
...
... url ='http://ajax.googleapis.com/ ajax / services / search / web?v = 1.0&%s'\
...%(query)
... search_results = urllib.urlopen(url)
.. 。json = simplejson.loads(search_results.read())
... results = json ['responseData'] ['results']
这会给我第一页的结果,但我想要更多的谷歌结果,有谁知道如何做到这一点?
$ b $过去我已经完成了,这里有个完整的例子(我不是python guru,但它有效): #!/ usr / bin / env python
# - * - coding:utf-8 - * -
import sys,getopt
import urllib
import simplejson
OPTIONS =(m:,[min =])
def print_usage( ):
s =usage:+ sys.argv [0] +
for o选项[0]:
如果o!=::s + =[ - + o +]
print(s +query_string\ n)
def search(query,index,offset,min_count,quiet = False,rs = []):
url =http://ajax.googleapis.com/ajax /services/search/web?v=1.0&rsz=large&%s&start=%s%(query,offset)
result = urllib.urlopen(url)
json = simplejson。加载(result.read())
status = json [responseStatus]
if status == 200:
results = json [responseData] [results]
cursor = json [responseData] [cursor]
pages = cursor [pages]
for result in:
i = results.index(r)+(index - 1)* len(results)+ 1
u = r [unescapedUrl]
rs.append(u)
如果不安静:
print(%3d。 %s%(i,u))
next_index = None
next_offset = None
对于页中的p:
如果p [label] == index:
i = pages.index(p)
if i< len(pages) - 1:
next_index = pages [i + 1] [label]
next_offset = pages [i +1] [start]
break
if next_index!= None和next_offset!= None:
if(next_offset)< min_count:
search(query,next_index ,next_offset,min_count,quiet,rs)
return rs
def main():
min_count = 64
try:
opts,args = getopt .getopt(sys.argv [1:],* OPTIONS)
for opt,arg in opts:
如果选择in(-m,--min):
min_count = int(arg)
assert len(args)> 0
除了:
print_usage()
sys.exit(1)
qs =.join (args)
query = urllib.u rlencode({q:qs})
search(query,1,0,min_count)
if __name__ ==__main__:
main()
编辑:我修复了显而易见的命令行选项错误处理;你可以调用这个脚本,如下所示:
python gsearch.py --min = 5 vanessa mae
- min
开关意味着至少5个结果并且是可选的,如果未指定,您将获得允许的最大结果数(64)。
另外,为简洁起见,省略错误处理。
I need to get the first 10 google results
for example:
... query = urllib.urlencode({'q' : 'example'})
...
... url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' \
... % (query)
... search_results = urllib.urlopen(url)
... json = simplejson.loads(search_results.read())
... results = json['responseData']['results']
this will give me the results of the first page, but I`d like to get more google results, do anyone know how to do that?
I've done it in the past, here is complete example (i'm not python guru, but it works):
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys, getopt
import urllib
import simplejson
OPTIONS = ("m:", ["min="])
def print_usage():
s = "usage: " + sys.argv[0] + " "
for o in OPTIONS[0]:
if o != ":" : s += "[-" + o + "] "
print(s + "query_string\n")
def search(query, index, offset, min_count, quiet=False, rs=[]):
url = "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&rsz=large&%s&start=%s" % (query, offset)
result = urllib.urlopen(url)
json = simplejson.loads(result.read())
status = json["responseStatus"]
if status == 200:
results = json["responseData"]["results"]
cursor = json["responseData"]["cursor"]
pages = cursor["pages"]
for r in results:
i = results.index(r) + (index -1) * len(results) + 1
u = r["unescapedUrl"]
rs.append(u)
if not quiet:
print("%3d. %s" % (i, u))
next_index = None
next_offset = None
for p in pages:
if p["label"] == index:
i = pages.index(p)
if i < len(pages) - 1:
next_index = pages[i+1]["label"]
next_offset = pages[i+1]["start"]
break
if next_index != None and next_offset != None:
if int(next_offset) < min_count:
search(query, next_index, next_offset, min_count, quiet, rs)
return rs
def main():
min_count = 64
try:
opts, args = getopt.getopt(sys.argv[1:], *OPTIONS)
for opt, arg in opts:
if opt in ("-m", "--min"):
min_count = int(arg)
assert len(args) > 0
except:
print_usage()
sys.exit(1)
qs = " ".join(args)
query = urllib.urlencode({"q" : qs})
search(query, 1, "0", min_count)
if __name__ == "__main__":
main()
Edit: i've fixed obvious command-line options mishandling; you can call this script as follows:
python gsearch.py --min=5 vanessa mae
--min
switch means "at least 5 results" and is optional, you will get maximum allowed result count (64) if it is not specified.
Also, error handling is omitted for brevity.
这篇关于使用googleapi获得前10个谷歌搜索结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!