从p标签获取文本内容 [英] get text content from p tag
问题描述
我正在尝试获取此页面上每个块的描述文本内容
I am trying to get description text content of each block on this page
https://twitter.com/搜索?q = data%20mining& src = typd& vertical = default& f = users .
p标签的html看起来像
html for p tag looks like
<p class="ProfileCard-bio u-dir" dir="ltr" data-aria-label-part=""><a href="http://t.co/kwtDyFn6dC" rel="nofollow" dir="ltr" data-expanded-url="http://DataMiningBlog.com" class="twitter-timeline-link" target="_blank" title="http://DataMiningBlog.com"><span class="invisible">http://</span><span class="js-display-url">DataMiningBlog.com</span><span class="tco-ellipsis"><span class="invisible"> </span></span></a> covers current challenges, interviews with leading actors and book reviews related to data mining, analytics and data science.</p>
我的代码:
productDivs = soup.findAll('div', attrs={'class' : 'ProfileCard-content'})
for div in productDivs:
print div.find('p', attrs={'class' : 'ProfileCard-bio u-dir'}).text
这里有什么问题吗?在这里获取异常
anything wrong here? Getting exception here
Traceback (most recent call last):
File "twitter_user_scrapper.py", line 91, in getImageList
print div.find('p', attrs={'class' : 'ProfileCard-bio u-dir'}).text
AttributeError: 'NoneType' object has no attribute 'text'
推荐答案
问题可能是某些 div
与 class
作为 ProfileCard-content
可能没有子类- ProfileCard-bio u-dir
的子元素 p
,发生这种情况时,以下内容将返回 None
->
The issue might be that some div
with class
as ProfileCard-content
may not have a child p
element with class - ProfileCard-bio u-dir
, when that happens , the following returns None
-
div.find('p', attrs={'class' : ['ProfileCard-bio', 'u-dir']})
这就是您得到 AttributeError
的原因.您应该获取上述的返回值并将其保存在变量中,并检查其 None
是否为空,并且仅当其文本为None时才采用文本.
And that is the reason you are getting the AttributeError
. You should get the return of above and save it in a variable , and check whether its None
or not and take the text only if its not None.
此外,您应该将class作为所有类的列表,而不是单个字符串,例如-
Also, you should give class as a list of all the classes , not a single string, as -
attrs={'class' : ['ProfileCard-bio', 'u-dir']}
示例-
productDivs = soup.findAll('div', attrs={'class' : 'ProfileCard-content'})
for div in productDivs:
elem = div.find('p', attrs={'class' : ['ProfileCard-bio', 'u-dir']})
if elem:
print elem.text
这篇关于从p标签获取文本内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!