如何从< a>中提取链接在< h2 class = section-heading>内:BeautifulSoup [英] How to extract link from <a> inside the <h2 class=section-heading>:BeautifulSoup

查看：51 发布时间：2020/9/20 7:52:11 python beautifulsoup python-requests bs4

本文介绍了如何从< a>中提取链接在< h2 class = section-heading>内:BeautifulSoup的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试提取这样写的链接:

I am trying to extract a link which is written like this:

<h2 class="section-heading">
    <a href="http://www.nytimes.com/pages/arts/index.html">Arts »</a>
</h2>

我的代码是:

from bs4 import BeautifulSoup
import requests, re

def get_data():
    url='http://www.nytimes.com/'
    s_code=requests.get(url)
    plain_text = s_code.text
    soup = BeautifulSoup(plain_text)
    head_links=soup.findAll('h2', {'class':'section-heading'})

    for n in head_links :
       a = n.find('a')
       print a
       print n.get['href'] 
       #print a['href']
       #print n.get('href')
       #headings=n.text
       #links = n.get('href')
       #print headings, links

get_data()

像"print a"一样，只是打印出<h2 class=section-heading>内的整个<a>行，即

the like "print a" simply prints out the whole <a> line inside the <h2 class=section-heading> i.e.

<a href="http://www.nytimes.com/pages/world/index.html">World »</a>

但是当我执行"print n.get ['href']"时，会抛出一个错误；

but when I execute "print n.get['href']", it throws me an error;

print n.get['href'] 
TypeError: 'instancemethod' object has no attribute '__getitem__'

我在这里做错什么了吗?请帮助

Am I doing something wrong here? Please help

我在这里找不到类似的案例问题，我的问题在这里有点独特，我正在尝试提取特定类名部分标题中的链接.

I couldn't find some similar case question here, my issue is a bit unique here, I am trying to extract a link that is inside a specific class names section-headings.

推荐答案

首先，您要获取a元素的href，因此您应该在该元素上访问a而不是n线.其次，应该是

First of all, you want to fetch the href of the a element, thus you should be accessing a not n on that line. Secondly, it should be either

a.get('href')

或

a['href']

如果找不到这样的属性，则后者将引发，而前者将返回None，就像通常的字典/映射接口一样.由于.get是一种方法，因此应将其称为(.get(...));.索引/元素访问对它不起作用(.get[...])，这就是这个问题.

The latter form throws if no such attribute is found, whereas the former would return None, like the usual dictionary/mapping interface. As .get is a method, it should be called (.get(...)); indexing/element access wouldn't work for it (.get[...]), which is what this question is about.

请注意，find可能也在那里失败，返回None，也许您想遍历n.find_all('a', href=True):

Notice, that find might as well fail there, returning None, perhaps you wanted to iterate over n.find_all('a', href=True):

for n in head_links:
   for a in n.find_all('a', href=True):
       print(a['href'])

使用select方法(使用CSS选择器)比使用find_all更容易.在这里，通过一次操作，我们只能像在JQuery中一样容易地获得位于<h2 class="section-heading">内部的具有href属性的<a>元素.

Even easier than using find_all is to use the select method which takes a CSS selector. Here with a single operation we only get those <a> elements with href attribute that are inside a <h2 class="section-heading"> as easily as with JQuery.

soup = BeautifulSoup(plain_text)
for a in soup.select('h2.section-heading a[href]'):
    print(a['href'])

(此外，请在您编写的任何新代码中使用小写的方法名称).

(Also, please use the lower-case method names in any new code that you write).

这篇关于如何从< a>中提取链接在< h2 class = section-heading>内:BeautifulSoup的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何从< a>中提取链接在< h2 class = section-heading>内:BeautifulSoup [英] How to extract link from <a> inside the <h2 class=section-heading>:BeautifulSoup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何从&lt; a&gt;中提取链接在&lt; h2 class = section-heading&gt;内:BeautifulSoup [英] How to extract link from &lt;a&gt; inside the &lt;h2 class=section-heading&gt;:BeautifulSoup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

如何从< a>中提取链接在< h2 class = section-heading>内:BeautifulSoup [英] How to extract link from <a> inside the <h2 class=section-heading>:BeautifulSoup

登录关闭