谷歌搜索刮板,Python [英] Google search scraper , Python
问题描述
我是Python的新手,为了获取股票价格而尝试制作Google搜索抓取工具,但是我在下面运行我的代码,但未得到任何结果,而是获取页面HTML格式.
导入urllib.request从bs4导入BeautifulSoup汇入要求url ='https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=uwti'响应= requests.get(URL)html = response.content汤= BeautifulSoup(html,"html.parser")打印(soup.prettify())
我错过了一些非常简单的东西吗,请给我一些提示.我正在尝试提取当前库存值.如何在附件图像中提取此值?
右键单击并在浏览器中选择view-source,它在源中.您只需要稍微更改 url 并传递一个 user-agent 即可使用请求匹配您在此处看到的内容:
在[2]中:从bs4导入BeautifulSoup...:导入请求...:...:网址='https://www.google.com/search?q=uwti&rct=j'...:响应= request.get(url,headers = {...:用户代理":"Mozilla/5.0(X11; Linux x86_64)AppleWebKit/537.36(K...:HTML,例如Gecko)Chrome/53.0.2785.143 Safari/537.36})...:html = response.content...:...:汤= BeautifulSoup(html,"html.parser")...:打印(soup.select_one("span._Rnb.fmob_pr.fac-l").text)...:27.51
soup.find("span",class _ ="_ Rnb fmob_pr fac-l").text
也会起作用,并且是使用 css类查找标签的正确方法和find或 find_all
使用
I am new to Python and trying to make a Google search scraper for the purpose of getting stock prices , but I run my code below I dont get any results instead I am getting the page HTML formatting.
import urllib.request
from bs4 import BeautifulSoup
import requests
url = 'https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=uwti'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, "html.parser")
print(soup.prettify())
Am I missing something very simple , please give me some pointers on this . I am trying to extract the current stock value.How do I extract this value in the attached image ?
It is in the source when you right-click and choose view-source in your browser. You just need to change the url slightly and pass a user-agent to match what you see there using requests:
In [2]: from bs4 import BeautifulSoup
...: import requests
...:
...: url = 'https://www.google.com/search?q=uwti&rct=j'
...: response = requests.get(url, headers={
...: "user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (K
...: HTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36"})
...: html = response.content
...:
...: soup = BeautifulSoup(html, "html.parser")
...: print(soup.select_one("span._Rnb.fmob_pr.fac-l").text)
...:
27.51
soup.find("span", class_="_Rnb fmob_pr fac-l").text
would also work and is the correct way to look for a tag using the css classes with find or find_all
You can see in chrome when you use https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=uwti, there is a redirect to https://www.google.com/search?q=uwti&rct=j:
这篇关于谷歌搜索刮板,Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!