如何从< p>中抓取文本元素"id" [英] How to scrape text from a <p> elements "id"

查看：48 发布时间：2021/4/15 19:15:04 python web-scraping beautifulsoup nonetype

本文介绍了如何从< p>中抓取文本元素"id"的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在学习如何抓取，那么我还不是很先进.我从彭博社刮掉公司的介绍.例如，从此页面中( https://www.bloomberg.com/research/stocks/private/snapshot.asp?privcapId = 320105 )

i'm learning how to scrape, then I'm not really advanced. I wuold scrape from bloomberg the company description. For instance from this page (https://www.bloomberg.com/research/stocks/private/snapshot.asp?privcapId=320105)

我想抓

<p id="bDescTeaser" itemprop="description">Fiat Chrysler Automobiles N.V., ...</p>

我的脚本:

from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
html= 'https://www.bloomberg.com/research/stocks/private/snapshot.asp? 
privcapId=32010'
page = urlopen(html)
data = BeautifulSoup(page, 'html.parser')
text = data.find('p',id="bDescTeaser",itemprop="  ")
print(text.get_text))

如果我尝试运行得到的程序，

If I try to run the program I get,

AttributeError: 'NoneType' object has no attribute 'get_text'

这是我的代码还是特定的Webapge问题?

Is this a problem with my code or with this specific webapge?

推荐答案

在您的解决方案中，Bloomberg阻止了您的请求.因为它认为您是机器人.您应该使用请求库并将用户代理发送为标头.您将通过这种方式获得预期的输出.

In your solution Bloomberg blocks your request. Because it thinks you are a bot. You should use requests library and send user agent as header. You will get your expected output this way.

import requests
from bs4 import BeautifulSoup

header = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0'}
request = requests.get('https://www.bloomberg.com/research/stocks/private/snapshot.asp?privcapId=320105',headers=header)
soup = BeautifulSoup(request.text, 'html.parser')    
text = soup.find('p',id="bDescTeaser")
print(text.get_text())

这篇关于如何从< p>中抓取文本元素"id"的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何从< p>中抓取文本元素"id" [英] How to scrape text from a <p> elements "id"

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何从&lt; p&gt;中抓取文本元素"id" [英] How to scrape text from a &lt;p&gt; elements &quot;id&quot;

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

如何从< p>中抓取文本元素"id" [英] How to scrape text from a <p> elements "id"

登录关闭