更新的方法来谷歌搜索与Python [英] Updated approach to Google search with python

查看:135
本文介绍了更新的方法来谷歌搜索与Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图用 xgoogle ,但我一直没有更新了3年,我不断得到不超过5个结果,即使我设置每页100个结果。如果有人使用xgoogle不会有任何问题,请让我知道。

I was trying to use xgoogle but I has not been updated for 3 years and I just keep getting no more than 5 results even if I set 100 results per page. If anyone uses xgoogle without any problem please let me know.

现在,因为唯一可用的(显然)包装是xgoogle,选项是使用某种浏览器,如机械化,但就是会令code完全依赖于谷歌的HTML,他们可能会改变它很多东西。

Now, since the only available(apparently) wrapper is xgoogle, the option is to use some sort of browser, like mechanize, but that is gonna make the code entirely dependant on google HTML and they might change it a lot.

最后一个选项是使用自定义搜索的API,Google提供,但每天有极限的redicolous 100请求和定价之后。

Final option is to use the Custom search API that google offers, but is has a redicolous 100 requests per day limit and a pricing after that.

我需要帮助哪个方向,我应该去,有什么其他的选择你知道的,什么适合你。

I need help on which direction should I go, what other options do you know of and what works for you.

谢谢!

推荐答案

它只需要一个小补丁。

功能GoogleSearch._extract_result(search.py​​ 237线)调用GoogleSearch._extract_description(258线)未有造成_extract_result为大多数结果显示,因此比预期少的结果返回None。

The function GoogleSearch._extract_result (Line 237 of search.py) calls GoogleSearch._extract_description (Line 258) which fails causing _extract_result to return None for most of the results therefore showing fewer results than expected.

修正:

在search.py​​,改变从这259线:

In search.py, change Line 259 from this:

desc_div = result.find('div', {'class': re.compile(r'\bs\b')})

这样:

desc_div = result.find('span', {'class': 'st'})

我使用的测试:

#!/usr/bin/python
#
# This program does a Google search for "quick and dirty" and returns
# 200 results.
#

from xgoogle.search import GoogleSearch, SearchError

class give_me(object):
    def __init__(self, query, target):
        self.gs = GoogleSearch(query)
        self.gs.results_per_page = 50
        self.current = 0
        self.target = target
        self.buf_list = []

    def __iter__(self):
        return self

    def next(self):
        if self.current >= self.target:
            raise StopIteration
        else:
            if(not self.buf_list):
                self.buf_list = self.gs.get_results()
            self.current += 1
            return self.buf_list.pop(0)

try:
    sites = {}
    for res in give_me("quick and dirty", 200):
        t_dict = \
        {
            "title" : res.title.encode('utf8'),
            "desc" : res.desc.encode('utf8'),
            "url" : res.url.encode('utf8')
        }
        sites[t_dict["url"]] = t_dict
    print t_dict
except SearchError, e:
    print "Search failed: %s" % e

这篇关于更新的方法来谷歌搜索与Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆