python/beautifulsoup查找所有< a href>带有特定的锚文本 [英] python/beautifulsoup to find all <a href> with specific anchor text
本文介绍了python/beautifulsoup查找所有< a href>带有特定的锚文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试使用漂亮的汤来解析html并找到具有特定锚标记的所有href
I am trying to use beautiful soup to parse html and find all href with a specific anchor tag
<a href="http://example.com">TEXT</a>
<a href="http://example.com/link">TEXT</a>
<a href="http://example.com/page">TEXT</a>
我要查找的所有链接都具有完全相同的锚文本,在本例中为TEXT.我不是要查找文字"一词,而是要使用文字"一词来查找所有不同的HREF
all the links I am looking for have the exact same anchor text, in this case TEXT. I am NOT looking for the word TEXT, I want to use the word TEXT to find all the different HREF
为澄清起见,寻找类似于使用类来解析链接的内容
for clarification looking for something similar to using the class to parse for the links
<a href="http://example.com" class="visible">TEXT</a>
<a href="http://example.com/link" class="visible">TEXT</a>
<a href="http://example.com/page" class="visible">TEXT</a>
然后使用
findAll('a', 'visible')
除了我正在解析的HTML没有一个类,而是始终具有相同的锚文本
except the HTML I am parsing doesn't have a class but always the same anchor text
推荐答案
这样的作品行吗?
In [39]: from bs4 import BeautifulSoup
In [40]: s = """\
....: <a href="http://example.com">TEXT</a>
....: <a href="http://example.com/link">TEXT</a>
....: <a href="http://example.com/page">TEXT</a>
....: <a href="http://dontmatchme.com/page">WRONGTEXT</a>"""
In [41]: soup = BeautifulSoup(s)
In [42]: for link in soup.findAll('a', href=True, text='TEXT'):
....: print link['href']
....:
....:
http://example.com
http://example.com/link
http://example.com/page
这篇关于python/beautifulsoup查找所有< a href>带有特定的锚文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文