用于结果数量的 Python 谷歌包装器 [英] Python google wrapper for number of results

查看:30
本文介绍了用于结果数量的 Python 谷歌包装器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经看到一些帖子来获取谷歌搜索的搜索结果数量,但到目前为止没有一个能满足我的需求.我想搜索一个带有空格的字符串,并获得与在 google 中手动执行的搜索几乎相同数量的结果.到目前为止我的日常是

导入请求从 bs4 导入 BeautifulSouptest='只是一个用于搜索的测试字符串'r = requests.get('http://www.google.com/search',参数={'q':测试})汤 = BeautifulSoup(r.text,"html5lib")test=soup.find('div',{'id':'resultStats'}).text

套路给出32400个搜索结果,在google页面手动搜索85000,我做错了什么?!当我只搜索一个词时,偏差要小得多.

解决方案

只是每次发送请求都会有所不同.Google 搜索结果在不同的计算机上是不同的,他们希望并期望搜索结果因人而异.这取决于很多因素.

就来自机器人(脚本)的发送请求而言,结果很可能是相同的,但并非总是如此.

例如:

>>>汤.select_one('#result-stats nobr').previous_sibling'大约 4,100,000,000 个结果'# 事实上,我的浏览器中有 4,000,000 个结果

代码和在线 IDE 中的示例:

导入请求,lxml从 bs4 导入 BeautifulSoup标题 = {用户代理":Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"}参数 = {q":逐步指导如何恶作剧谷歌",gl":我们",hl":en"}response = requests.get('https://www.google.com/search', headers=headers, params=params)汤 = BeautifulSoup(response.text, 'lxml')number_of_results = soup.select_one('#result-stats nobr').previous_sibling打印(number_of_results)-----# 大约 7,800,000 个结果


或者,您可以使用 SerpApi 的 Google Organic Results API.这是一个带有免费计划的付费 API.

主要区别在于您只需要迭代结构化 JSON,而无需弄清楚如何提取某些元素并从头开始编码所有内容.无需维护解析器.

import os从 serpapi 导入 GoogleSearch参数 = {引擎":谷歌",q":逐步指导如何恶作剧谷歌",api_key":os.getenv(API_KEY"),}搜索 = 谷歌搜索(参数)结果 = search.get_dict()结果 = 结果[search_information"]['total_results']打印(结果)-----# 7800000

<块引用>

免责声明,我为 SerpApi 工作.

I've seen some posts already to get the number of search results for a google search, but none is satisfying my needs so far. I want to search a string with blank spaces and get almost the same number of results as a manually executed search in google. My routine so far is

import requests
from bs4 import BeautifulSoup


test='just a teststring for the search'
r = requests.get('http://www.google.com/search',
                     params={'q':test}
                    )
soup = BeautifulSoup(r.text,"html5lib")
test=soup.find('div',{'id':'resultStats'}).text

The routine gives 32400 search results, the manual search at the google page 85000, what am I doing wrong?! When I just search for one word, the deviation is much smaller.

解决方案

It will just always differ on each send request. Google search results are different on different computers, they want and expect search results to be different from person to person. It just depends on many factors.

In terms of send requests from bots (scripts), it is likely that the results will be the same, but not all the time.

For example:

>>> soup.select_one('#result-stats nobr').previous_sibling
'About 4,100,000,000 results'
# in fact, there're was 4,000,000 results in my browser

Code and example in the online IDE:

import requests, lxml
from bs4 import BeautifulSoup

headers = {
    "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

params = {
  "q": "step by step guide how to prank google",
  "gl": "us",
  "hl": "en"
}

response = requests.get('https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(response.text, 'lxml')

number_of_results = soup.select_one('#result-stats nobr').previous_sibling
print(number_of_results)

-----
# About 7,800,000 results


Alternatively, you can use Google Organic Results API from SerpApi. It's a paid API with a free plan.

The main difference is that you only need to iterate over structured JSON, without figuring out how to extract certain elements and coding everything from scratch. No need to maintain the parser.

import os
from serpapi import GoogleSearch

params = {
    "engine": "google",
    "q": "step by step guide how to prank google",
    "api_key": os.getenv("API_KEY"),
}

search = GoogleSearch(params)
results = search.get_dict()

result = results["search_information"]['total_results']
print(result)

-----
# 7800000

Disclaimer, I work for SerpApi.

这篇关于用于结果数量的 Python 谷歌包装器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆