从p标签获取文本内容 [英] get text content from p tag

查看:72
本文介绍了从p标签获取文本内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试获取此页面上每个块的描述文本内容

I am trying to get description text content of each block on this page

https://twitter.com/搜索?q = data%20mining& src = typd& vertical = default& f = users .

p标签的html看起来像

html for p tag looks like

<p class="ProfileCard-bio u-dir" dir="ltr" data-aria-label-part=""><a href="http://t.co/kwtDyFn6dC" rel="nofollow" dir="ltr" data-expanded-url="http://DataMiningBlog.com" class="twitter-timeline-link" target="_blank" title="http://DataMiningBlog.com"><span class="invisible">http://</span><span class="js-display-url">DataMiningBlog.com</span><span class="tco-ellipsis"><span class="invisible">&nbsp;</span></span></a> covers current challenges, interviews with leading actors and book reviews related to data mining, analytics and data science.</p>

我的代码:

productDivs = soup.findAll('div', attrs={'class' : 'ProfileCard-content'})
for div in productDivs:
   print div.find('p', attrs={'class' : 'ProfileCard-bio u-dir'}).text

这里有什么问题吗?在这里获取异常

anything wrong here? Getting exception here

Traceback (most recent call last):
  File "twitter_user_scrapper.py", line 91, in getImageList
    print div.find('p', attrs={'class' : 'ProfileCard-bio u-dir'}).text
AttributeError: 'NoneType' object has no attribute 'text'

推荐答案

问题可能是某些 div class 作为 ProfileCard-content 可能没有子类- ProfileCard-bio u-dir 的子元素 p ,发生这种情况时,以下内容将返回 None -

The issue might be that some div with class as ProfileCard-content may not have a child p element with class - ProfileCard-bio u-dir , when that happens , the following returns None -

div.find('p', attrs={'class' : ['ProfileCard-bio', 'u-dir']})

这就是您得到 AttributeError 的原因.您应该获取上述的返回值并将其保存在变量中,并检查其 None 是否为空,并且仅当其文本为None时才采用文本.

And that is the reason you are getting the AttributeError. You should get the return of above and save it in a variable , and check whether its None or not and take the text only if its not None.

此外,您应该将class作为所有类的列表,而不是单个字符串,例如-

Also, you should give class as a list of all the classes , not a single string, as -

attrs={'class' : ['ProfileCard-bio', 'u-dir']}

示例-

productDivs = soup.findAll('div', attrs={'class' : 'ProfileCard-content'})
for div in productDivs:
   elem = div.find('p', attrs={'class' : ['ProfileCard-bio', 'u-dir']})
   if elem:
       print elem.text

这篇关于从p标签获取文本内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆