从p标签获取文本内容 [英] get text content from p tag

查看：72 发布时间：2021/4/15 19:19:10 python web-scraping beautifulsoup

本文介绍了从p标签获取文本内容的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试获取此页面上每个块的描述文本内容

I am trying to get description text content of each block on this page

https://twitter.com/搜索?q = data％20mining& src = typd& vertical = default& f = users .

p标签的html看起来像

html for p tag looks like

<p class="ProfileCard-bio u-dir" dir="ltr" data-aria-label-part=""><a href="http://t.co/kwtDyFn6dC" rel="nofollow" dir="ltr" data-expanded-url="http://DataMiningBlog.com" class="twitter-timeline-link" target="_blank" title="http://DataMiningBlog.com"><span class="invisible">http://</span><span class="js-display-url">DataMiningBlog.com</span><span class="tco-ellipsis"><span class="invisible">&nbsp;</span></span></a> covers current challenges, interviews with leading actors and book reviews related to data mining, analytics and data science.</p>

我的代码:

productDivs = soup.findAll('div', attrs={'class' : 'ProfileCard-content'})
for div in productDivs:
   print div.find('p', attrs={'class' : 'ProfileCard-bio u-dir'}).text

这里有什么问题吗?在这里获取异常

anything wrong here? Getting exception here

Traceback (most recent call last):
  File "twitter_user_scrapper.py", line 91, in getImageList
    print div.find('p', attrs={'class' : 'ProfileCard-bio u-dir'}).text
AttributeError: 'NoneType' object has no attribute 'text'

推荐答案

问题可能是某些 div 与 class 作为 ProfileCard-content 可能没有子类- ProfileCard-bio u-dir 的子元素 p ，发生这种情况时，以下内容将返回 None -

The issue might be that some div with class as ProfileCard-content may not have a child p element with class - ProfileCard-bio u-dir , when that happens , the following returns None -

div.find('p', attrs={'class' : ['ProfileCard-bio', 'u-dir']})

这就是您得到 AttributeError 的原因.您应该获取上述的返回值并将其保存在变量中，并检查其 None 是否为空，并且仅当其文本为None时才采用文本.

And that is the reason you are getting the AttributeError. You should get the return of above and save it in a variable , and check whether its None or not and take the text only if its not None.

此外，您应该将class作为所有类的列表，而不是单个字符串，例如-

Also, you should give class as a list of all the classes , not a single string, as -

attrs={'class' : ['ProfileCard-bio', 'u-dir']}

示例-

productDivs = soup.findAll('div', attrs={'class' : 'ProfileCard-content'})
for div in productDivs:
   elem = div.find('p', attrs={'class' : ['ProfileCard-bio', 'u-dir']})
   if elem:
       print elem.text

这篇关于从p标签获取文本内容的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从p标签获取文本内容 [英] get text content from p tag

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从p标签获取文本内容 [英] get text content from p tag

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭