如何使用 BeautifulSoup 在 Python 中解析谷歌搜索结果 [英] How to use BeautifulSoup to parse google search results in Python

查看:25
本文介绍了如何使用 BeautifulSoup 在 Python 中解析谷歌搜索结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解析 google 搜索结果的第一页.具体来说,提供的标题和小摘要.这是我目前所拥有的:

I am trying to parse the first page of google search results. Specifically, the Title and the small Summary that is provided. Here is what I have so far:

from urllib.request import urlretrieve
import urllib.parse
from urllib.parse import urlencode, urlparse, parse_qs
import webbrowser
from bs4 import BeautifulSoup
import requests

address = 'https://google.com/#q='
# Default Google search address start
file = open( "OCR.txt", "rt" )
# Open text document that contains the question
word = file.read()
file.close()

myList = [item for item in word.split('
')]
newString = ' '.join(myList)
# The question is on multiple lines so this joins them together with proper spacing

print(newString)

qstr = urllib.parse.quote_plus(newString)
# Encode the string

newWord = address + qstr
# Combine the base and the encoded query

print(newWord)

source = requests.get(newWord)

soup = BeautifulSoup(source.text, 'lxml')

我现在卡住的部分是沿着 HTML 路径解析我想要的特定数据.到目前为止,我所尝试的一切都只是抛出一个错误,说它没有属性,或者它只是返回[]".

The part I am stuck on now is going down the HTML path to parse the specific data that I want. Everything I have tried so far has just thrown an error saying that it has no attribute or it just gives back "[]".

我是 Python 和 BeautifulSoup 的新手,所以我不确定如何到达我想要的地方的语法.我发现这些是页面中的单个搜索结果:

I am new to Python and BeautifulSoup so I am not sure the syntax of how to get to where I want. I have found that these are the individual search results in the page:

https://ibb.co/jfRakR

任何关于添加什么来解析每个搜索结果的标题和摘要的帮助将不胜感激.

Any help on what to add to parse the Title and Summary of each search result would be MASSIVELY appreciated.

谢谢!

推荐答案

你的网址对我不起作用.但是使用 https://google.com/search?q= 我得到了结果.

Your url doesn't work for me. But with https://google.com/search?q= I get results.

import urllib
from bs4 import BeautifulSoup
import requests
import webbrowser

text = 'hello world'
text = urllib.parse.quote_plus(text)

url = 'https://google.com/search?q=' + text

response = requests.get(url)

#with open('output.html', 'wb') as f:
#    f.write(response.content)
#webbrowser.open('output.html')

soup = BeautifulSoup(response.text, 'lxml')
for g in soup.find_all(class_='g'):
    print(g.text)
    print('-----')

阅读Beautiful Soup 文档

这篇关于如何使用 BeautifulSoup 在 Python 中解析谷歌搜索结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆