更新的方法来谷歌搜索与Python [英] Updated approach to Google search with python
问题描述
我试图用 xgoogle ,但我一直没有更新了3年,我不断得到不超过5个结果,即使我设置每页100个结果。如果有人使用xgoogle不会有任何问题,请让我知道。
I was trying to use xgoogle but I has not been updated for 3 years and I just keep getting no more than 5 results even if I set 100 results per page. If anyone uses xgoogle without any problem please let me know.
现在,因为唯一可用的(显然)包装是xgoogle,选项是使用某种浏览器,如机械化,但就是会令code完全依赖于谷歌的HTML,他们可能会改变它很多东西。
Now, since the only available(apparently) wrapper is xgoogle, the option is to use some sort of browser, like mechanize, but that is gonna make the code entirely dependant on google HTML and they might change it a lot.
最后一个选项是使用自定义搜索的API,Google提供,但每天有极限的redicolous 100请求和定价之后。
Final option is to use the Custom search API that google offers, but is has a redicolous 100 requests per day limit and a pricing after that.
我需要帮助哪个方向,我应该去,有什么其他的选择你知道的,什么适合你。
I need help on which direction should I go, what other options do you know of and what works for you.
谢谢!
推荐答案
它只需要一个小补丁。
功能GoogleSearch._extract_result(search.py 237线)调用GoogleSearch._extract_description(258线)未有造成_extract_result为大多数结果显示,因此比预期少的结果返回None。
The function GoogleSearch._extract_result (Line 237 of search.py) calls GoogleSearch._extract_description (Line 258) which fails causing _extract_result to return None for most of the results therefore showing fewer results than expected.
修正:
在search.py,改变从这259线:
In search.py, change Line 259 from this:
desc_div = result.find('div', {'class': re.compile(r'\bs\b')})
这样:
desc_div = result.find('span', {'class': 'st'})
我使用的测试:
#!/usr/bin/python
#
# This program does a Google search for "quick and dirty" and returns
# 200 results.
#
from xgoogle.search import GoogleSearch, SearchError
class give_me(object):
def __init__(self, query, target):
self.gs = GoogleSearch(query)
self.gs.results_per_page = 50
self.current = 0
self.target = target
self.buf_list = []
def __iter__(self):
return self
def next(self):
if self.current >= self.target:
raise StopIteration
else:
if(not self.buf_list):
self.buf_list = self.gs.get_results()
self.current += 1
return self.buf_list.pop(0)
try:
sites = {}
for res in give_me("quick and dirty", 200):
t_dict = \
{
"title" : res.title.encode('utf8'),
"desc" : res.desc.encode('utf8'),
"url" : res.url.encode('utf8')
}
sites[t_dict["url"]] = t_dict
print t_dict
except SearchError, e:
print "Search failed: %s" % e
这篇关于更新的方法来谷歌搜索与Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!