AttributeError: 'ResultSet' 对象没有属性 'find_all' Beautifulsoup [英] AttributeError: 'ResultSet' object has no attribute 'find_all' Beautifulsoup

查看:47
本文介绍了AttributeError: 'ResultSet' 对象没有属性 'find_all' Beautifulsoup的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不明白为什么会出现此错误:

我有一个相当简单的函数:

def scrape_a(url):r = requests.get(url)汤 = BeautifulSoup(r.content)新闻 = 汤.find_all("div", attrs={"class": "news"})新闻链接:link = news.find_all("href")返回链接

这是我要抓取的网页的结构:

<a href="www.link.com"><h2 class="标题">标题<div class="teaserImg"><img alt="" border="0" height="124" src="/image">

<p>文本

</a>

解决方案

你做错了两件事:

  • 您正在对 news 结果集调用 find_all;大概您打算在 links 对象上调用它,该对象是该结果集中的一个元素.

  • 您的文档中没有 <href ...> 标签,因此使用 find_all('href') 搜索不会得到你什么.您只有带有 href attribute 的标签.

您可以将代码更正为:

def scrape_a(url):r = requests.get(url)汤 = BeautifulSoup(r.content)新闻 = 汤.find_all("div", attrs={"class": "news"})新闻链接:链接 = links.find_all(href=True)返回链接

做我认为你尝试做的事情.

我会使用 CSS 选择器:

def scrape_a(url):r = requests.get(url)汤 = BeautifulSoup(r.content)news_links = soup.select("div.news [href]")如果 news_links:返回 news_links[0]

如果您想返回 href 属性的值(链接本身),当然也需要提取它:

return news_links[0]['href']

如果您需要所有链接对象,而不是第一个,只需为链接对象返回news_links,或者使用列表解析来提取 URL:

return [link['href'] for link in news_links]

I dont understand why do i get this error:

I have a fairly simple function:

def scrape_a(url):
  r = requests.get(url)
  soup = BeautifulSoup(r.content)
  news =  soup.find_all("div", attrs={"class": "news"})
  for links in news:
    link = news.find_all("href")
    return link

Here is th estructure of webpage I am trying to scrape:

<div class="news">
<a href="www.link.com">
<h2 class="heading">
heading
</h2>
<div class="teaserImg">
<img alt="" border="0" height="124" src="/image">
</div>
<p> text </p>
</a>
</div>

解决方案

You are doing two things wrong:

  • You are calling find_all on the news result set; presumably you meant to call it on the links object, one element in that result set.

  • There are no <href ...> tags in your document, so searching with find_all('href') is not going to get you anything. You only have tags with an href attribute.

You could correct your code to:

def scrape_a(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.content)
    news =  soup.find_all("div", attrs={"class": "news"})
    for links in news:
        link = links.find_all(href=True)
        return link

to do what I think you tried to do.

I'd use a CSS selector:

def scrape_a(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.content)
    news_links = soup.select("div.news [href]")
    if news_links:
        return news_links[0]

If you wanted to return the value of the href attribute (the link itself), you need to extract that too, of course:

return news_links[0]['href']

If you needed all the link objects, and not the first, simply return news_links for the link objects, or use a list comprehension to extract the URLs:

return [link['href'] for link in news_links]

这篇关于AttributeError: 'ResultSet' 对象没有属性 'find_all' Beautifulsoup的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆