用bs4查找特定的链接文本 [英] Find specific link text with bs4

查看：105 发布时间：2018/6/21 12:37:55 python html web-scraping beautifulsoup

本文介绍了用bs4查找特定的链接文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图抓取一个网站，并找到所有饲料的标题。我无法获取我需要的 a 标签的文本。这是一个html的例子。

 < td class =mid =b1>< a href =/ QSYcfTid =c1target =_ blankonClick =vPI（'https://www.youtube.com/watch?v=BFNH-6K10Ic'，'QSYcfT'，this.id）; this.blur（）; return false;> TF4  -  Oreos< / a> < a href =＃onClick =return lkP（'1'，'QSYcfT'）; id =x1>< font class =bp>（0）< / font>< / a> 
< td class =mid =b2>< a href =/ zXHNvpid =c2target =_ blankonClick =vPI（'https：// www。 youtube.com/watch?v=0vjcGwZGBYI'，'zXHNvp'，this.id）; this.blur（）; return false;> Awesome Game Boy Facts< / a> < a href =＃onClick =return lkP（'2'，'zXHNvp'）; id =x2>< font class =bp>（0）< / font>< / a>

我正在为每个 a 标记，标识为 c ，并在新行上打印。 我的输出应该如下所示。

  TF4  - 奥利奥
真棒游戏男孩的事实
   
 
 $ b 到目前为止， （html）
 links = soup.find_all（'a'，{'id'：'c'}）
链接链接：
 print link.text 
  code>

但它没有找到或打印任何内容？

解决方案

您可以传递正则表达式代替属性值：

  links = soup.find_all（'a'，{'id'： re.compile（'^ c \d +'）}）

^ 表示字符串的开头， \ d + 匹配一个字符串或更多位数。

演示：

>>> ; import re >>> from bs4 import BeautifulSoup >>> >>> html = ...< tr> ...< td class =mid =b1>< a href =/ QSYcfTid = c1target =_ blankonClick =vPI（'https://www.youtube.com/watch?v=BFNH-6K10Ic'，'QSYcfT'，this.id）; this.blur（）;返回false;> TF4 - Oreos< / a>< a href =＃onClick =return lkP（'1'，'QSYcfT'）;id =x1>< / a>< / td> ...< td class =mid =b2>< a href = / zXHNvpid =c2target =_ blankonClick =vPI（'https://www.youtube.com/watch?v=0vjcGwZGBYI'，'zXHNvp'，this.id）; this.blur（）;返回false;> Awesome Game Boy Facts< / a>< a href =＃onClick =return lkP（'2'，'zXHNvp'）;id =x2>（0）< / a>< / td> ...< / tr> ... > ;>>汤= BeautifulSoup（html） >>> links = soup.find_all（'a'，{'id'：re.compile（'^ c \d +'）}） >>>链接链接： ...打印link.text ... TF4 - 奥利奥真棒游戏男孩的事实

I am trying to scrape a website and find all the headings of a feed. I am having trouble just getting the text of the a tag that I need. Here is an example of the html.
<td class="m" id="b1"><a href="/QSYcfT" id="c1" target="_blank" onClick="vPI('https://www.youtube.com/watch?v=BFNH-6K10Ic', 'QSYcfT', this.id); this.blur(); return false;">TF4 - Oreos</a> <a href="#" onClick="return lkP('1', 'QSYcfT');" id="x1">(0)</a> <td class="m" id="b2"><a href="/zXHNvp" id="c2" target="_blank" onClick="vPI('https://www.youtube.com/watch?v=0vjcGwZGBYI', 'zXHNvp', this.id); this.blur(); return false;">Awesome Game Boy Facts</a> <a href="#" onClick="return lkP('2', 'zXHNvp');" id="x2">(0)</a>
I am trying to get the text for every a tag with a id of c and print each on a new line.

My output should look like this.
TF4 - Oreos Awesome Game Boy Facts
So far I have tried.
soup = bs4.BeautifulSoup(html) links = soup.find_all('a',{'id' : 'c'}) for link in links: print link.text
But it doesn't find or print anything?
解决方案
You can pass a regular expression in place of an attribute value:
links = soup.find_all('a', {'id': re.compile('^c\d+')})
^ means the beginning of a string, \d+ matches one or more digits.

Demo:
>>> import re >>> from bs4 import BeautifulSoup >>> >>> html = """ ... <tr> ... <td class="m" id="b1"><a href="/QSYcfT" id="c1" target="_blank" onClick="vPI('https://www.youtube.com/watch?v=BFNH-6K10Ic', 'QSYcfT', this.id); this.blur(); return false;">TF4 - Oreos</a> <a href="#" onClick="return lkP('1', 'QSYcfT');" id="x1">(0)</a></td> ... <td class="m" id="b2"><a href="/zXHNvp" id="c2" target="_blank" onClick="vPI('https://www.youtube.com/watch?v=0vjcGwZGBYI', 'zXHNvp', this.id); this.blur(); return false;">Awesome Game Boy Facts</a> <a href="#" onClick="return lkP('2', 'zXHNvp');" id="x2">(0)</a></td> ... </tr> ... """ >>> soup = BeautifulSoup(html) >>> links = soup.find_all('a', {'id': re.compile('^c\d+')}) >>> for link in links: ... print link.text ... TF4 - Oreos Awesome Game Boy Facts

这篇关于用bs4查找特定的链接文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

用bs4查找特定的链接文本 [英] Find specific link text with bs4

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

用bs4查找特定的链接文本 [英] Find specific link text with bs4

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭