无法在BeautifulSoup中获得正确的链接 [英] Unable to get correct link in BeautifulSoup

查看:51
本文介绍了无法在BeautifulSoup中获得正确的链接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解析一些HTML,我想提取与特定模式匹配的链接.我正在使用带有正则表达式的 find 方法,但是它没有为我提供正确的链接.这是我的片段.有人可以告诉我我在做什么错吗?

I'm trying to parse a bit of HTML and I'd like to extract the link that matches a particular pattern. I'm using the find method with a regular expression but it doesn't get me the correct link. Here's my snippet. Could someone tell me what I'm doing wrong?

from BeautifulSoup import BeautifulSoup
import re

html = """
<div class="entry">
    <a target="_blank" href="http://www.rottentomatoes.com/m/diary_of_a_wimpy_kid/">RT</a>
    <a target="_blank" href="http://www.imdb.com/video/imdb/vi2496267289/">Trailer</a> &ndash; 
    <a target="_blank" href="http://www.imdb.com/title/tt1196141/">IMDB</a> &ndash; 
</div>
"""

soup = BeautifulSoup(html)
print soup.find('a', href = re.compile(r".*title/tt.*"))['href']

我应该获得第二个链接,但是BS总是返回第一个链接.第一个链接的 href 甚至与我的正则表达式都不匹配,为什么它返回它?

I should be getting the second link but BS always returns the first link. The href of the first link doesn't even match my regex so why does it return it?

谢谢.

推荐答案

查找仅返回第一个< a> 标记.您要

find only returns the first <a> tag. You want findAll.

这篇关于无法在BeautifulSoup中获得正确的链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆