从Google搜索中提取结果数 [英] extract the number of results from google search
问题描述
我正在编写一个网络抓取工具,以提取显示在搜索结果页面左上角的google搜索中的搜索结果数.我已经在下面编写了代码,但是我不明白为什么phrase_extract为None.我想提取短语大约12,010,000,000个结果".我在哪一部分出错了?可能无法正确解析HTML?
I am writing a web scraper to extract the number of results of searching in a google search which appears on the top left of the page of search results. I have written the code below but I do not understand why phrase_extract is None. I want to extract the phrase "About 12,010,000,000 results". which part I am making a mistake? may be parsing the HTML incorrectly?
import requests
from bs4 import BeautifulSoup
def pyGoogleSearch(word):
address='http://www.google.com/#q='
newword=address+word
#webbrowser.open(newword)
page=requests.get(newword)
soup = BeautifulSoup(page.content, 'html.parser')
phrase_extract=soup.find(id="resultStats")
print(phrase_extract)
pyGoogleSearch('world')
推荐答案
您实际上使用了错误的网址来查询Google的搜索引擎.您应该使用 http://www.google.com/search?q=<query>
.
You're actually using the wrong url to query google's search engine. You should be using http://www.google.com/search?q=<query>
.
所以它看起来像这样:
def pyGoogleSearch(word):
address = 'http://www.google.com/search?q='
newword = address + word
page = requests.get(newword)
soup = BeautifulSoup(page.content, 'html.parser')
phrase_extract = soup.find(id="resultStats")
print(phrase_extract)
您也可能只需要该元素的文本,而不是元素本身,因此您可以执行类似的操作
You also probably just want the text of that element, not the element itself, so you can do something like
phrase_text = phrase_extract.text
或获取为整数的实际值:
or to get the actual value as an integer:
val = int(phrase_extract.text.split(' ')[1].replace(',',''))
这篇关于从Google搜索中提取结果数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!