谷歌搜索刮板,Python [英] Google search scraper , Python

查看:59
本文介绍了谷歌搜索刮板,Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Python的新手,为了获取股票价格而尝试制作Google搜索抓取工具,但是我在下面运行我的代码,但未得到任何结果,而是获取页面HTML格式.

 导入urllib.request从bs4导入BeautifulSoup汇入要求url ='https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=uwti'响应= requests.get(URL)html = response.content汤= BeautifulSoup(html,"html.parser")打印(soup.prettify()) 

我错过了一些非常简单的东西吗,请给我一些提示.我正在尝试提取当前库存值.如何在附件图像中提取此值?

解决方案

右键单击并在浏览器中选择view-source,它在源中.您只需要稍微更改 url 并传递一个 user-agent 即可使用请求匹配您在此处看到的内容:

 在[2]中:从bs4导入BeautifulSoup...:导入请求...:...:网址='https://www.google.com/search?q=uwti&rct=j'...:响应= request.get(url,headers = {...:用户代理":"Mozilla/5.0(X11; Linux x86_64)AppleWebKit/537.36(K...:HTML,例如Gecko)Chrome/53.0.2785.143 Safari/537.36})...:html = response.content...:...:汤= BeautifulSoup(html,"html.parser")...:打印(soup.select_one("span._Rnb.fmob_pr.fac-l").text)...:27.51 

soup.find("span",class _ ="_ Rnb fmob_pr fac-l").text 也会起作用,并且是使用 css类查找标签的正确方法和find或 find_all

使用

I am new to Python and trying to make a Google search scraper for the purpose of getting stock prices , but I run my code below I dont get any results instead I am getting the page HTML formatting.

import urllib.request
from bs4 import BeautifulSoup

import requests

url = 'https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=uwti'
response = requests.get(url)
html = response.content

soup = BeautifulSoup(html, "html.parser")

print(soup.prettify())

Am I missing something very simple , please give me some pointers on this . I am trying to extract the current stock value.How do I extract this value in the attached image ?

解决方案

It is in the source when you right-click and choose view-source in your browser. You just need to change the url slightly and pass a user-agent to match what you see there using requests:

In [2]: from bs4 import BeautifulSoup
   ...: import requests
   ...: 
   ...: url = 'https://www.google.com/search?q=uwti&rct=j'
   ...: response = requests.get(url, headers={
   ...:     "user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (K
   ...: HTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36"})
   ...: html = response.content
   ...: 
   ...: soup = BeautifulSoup(html, "html.parser")
   ...: print(soup.select_one("span._Rnb.fmob_pr.fac-l").text)
   ...: 
27.51

soup.find("span", class_="_Rnb fmob_pr fac-l").text would also work and is the correct way to look for a tag using the css classes with find or find_all

You can see in chrome when you use https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=uwti, there is a redirect to https://www.google.com/search?q=uwti&rct=j:

这篇关于谷歌搜索刮板,Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆