从Google搜索中提取结果数 [英] extract the number of results from google search

查看:58
本文介绍了从Google搜索中提取结果数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个网络抓取工具,以提取显示在搜索结果页面左上角的google搜索中的搜索结果数.我已经在下面编写了代码,但是我不明白为什么phrase_extract为None.我想提取短语大约12,010,000,000个结果".我在哪一部分出错了?可能无法正确解析HTML?

I am writing a web scraper to extract the number of results of searching in a google search which appears on the top left of the page of search results. I have written the code below but I do not understand why phrase_extract is None. I want to extract the phrase "About 12,010,000,000 results". which part I am making a mistake? may be parsing the HTML incorrectly?

import requests
from bs4 import BeautifulSoup

def pyGoogleSearch(word):   
    address='http://www.google.com/#q='
    newword=address+word
    #webbrowser.open(newword)
    page=requests.get(newword)
    soup = BeautifulSoup(page.content, 'html.parser')
    phrase_extract=soup.find(id="resultStats")
    print(phrase_extract)

pyGoogleSearch('world')

推荐答案

您实际上使用了错误的网址来查询Google的搜索引擎.您应该使用 http://www.google.com/search?q=<query> .

You're actually using the wrong url to query google's search engine. You should be using http://www.google.com/search?q=<query>.

所以它看起来像这样:

def pyGoogleSearch(word):
    address = 'http://www.google.com/search?q='
    newword = address + word
    page = requests.get(newword)
    soup = BeautifulSoup(page.content, 'html.parser')
    phrase_extract = soup.find(id="resultStats")
    print(phrase_extract)

您也可能只需要该元素的文本,而不是元素本身,因此您可以执行类似的操作

You also probably just want the text of that element, not the element itself, so you can do something like

phrase_text = phrase_extract.text

或获取为整数的实际值:

or to get the actual value as an integer:

val = int(phrase_extract.text.split(' ')[1].replace(',',''))

这篇关于从Google搜索中提取结果数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆