使用 python 更新 Google 搜索方法 [英] Updated approach to Google search with python

查看:25
本文介绍了使用 python 更新 Google 搜索方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用 xgoogle 但我已经 3 年没有更新了,我只是不断地即使我每页设置 100 个结果,也不会超过 5 个结果.如果有人使用 xgoogle 没有任何问题,请告诉我.

I was trying to use xgoogle but I has not been updated for 3 years and I just keep getting no more than 5 results even if I set 100 results per page. If anyone uses xgoogle without any problem please let me know.

现在,由于唯一可用的(显然)包装器是 xgoogle,因此可以选择使用某种浏览器,例如机械化,但这会使代码完全依赖于 google HTML,他们可能会对其进行大量更改.

Now, since the only available(apparently) wrapper is xgoogle, the option is to use some sort of browser, like mechanize, but that is gonna make the code entirely dependant on google HTML and they might change it a lot.

最后的选择是使用 google 提供的自定义搜索 API,但每天有 100 个请求限制,然后是定价.

Final option is to use the Custom search API that google offers, but is has a redicolous 100 requests per day limit and a pricing after that.

我需要关于我应该去哪个方向的帮助,你知道哪些其他选择以及哪些对你有用.

I need help on which direction should I go, what other options do you know of and what works for you.

谢谢!

推荐答案

只需要一个小补丁.

函数 GoogleSearch._extract_result(search.py​​ 的第 237 行)调用了 GoogleSearch._extract_description(第 258 行),该函数失败导致 _extract_result 对大多数结果返回 None,因此显示的结果少于预期.

The function GoogleSearch._extract_result (Line 237 of search.py) calls GoogleSearch._extract_description (Line 258) which fails causing _extract_result to return None for most of the results therefore showing fewer results than expected.

修复:

在 search.py​​ 中,将第 259 行更改为:

In search.py, change Line 259 from this:

desc_div = result.find('div', {'class': re.compile(r'\bs\b')})

到此:

desc_div = result.find('span', {'class': 'st'})

我测试使用:

#!/usr/bin/python
#
# This program does a Google search for "quick and dirty" and returns
# 200 results.
#

from xgoogle.search import GoogleSearch, SearchError

class give_me(object):
    def __init__(self, query, target):
        self.gs = GoogleSearch(query)
        self.gs.results_per_page = 50
        self.current = 0
        self.target = target
        self.buf_list = []

    def __iter__(self):
        return self

    def next(self):
        if self.current >= self.target:
            raise StopIteration
        else:
            if(not self.buf_list):
                self.buf_list = self.gs.get_results()
            self.current += 1
            return self.buf_list.pop(0)

try:
    sites = {}
    for res in give_me("quick and dirty", 200):
        t_dict = \
        {
            "title" : res.title.encode('utf8'),
            "desc" : res.desc.encode('utf8'),
            "url" : res.url.encode('utf8')
        }
        sites[t_dict["url"]] = t_dict
    print t_dict
except SearchError, e:
    print "Search failed: %s" % e

这篇关于使用 python 更新 Google 搜索方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆