在已知元素beautifulsoup之外获取文本 [英] Get text outside known element beautifulsoup

查看:88
本文介绍了在已知元素beautifulsoup之外获取文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想抓取一个网页,并且我根本不想使用正则表达式.我正在用beautifulsoup处理刮擦.我有这个来源:

<TD WIDTH="50%" VALIGN="TOP"><span class="sections">Date:</span>
13 August 2014
      <br>&nbsp;<br><span class="sections">Application Deadline:</span>
     <font color="maroon">
      28 August  2014</font>

      <font color="#990066">Application closed / under review</font>

<br>&nbsp;<br><span class="sections">Duty Station:&nbsp;</span>
Multiple duty stations
<br>
&nbsp;

我想从此来源抓取 2014年8月13日.

我可以找到按其类搜索的span元素:soup.findAll('span',{'class':'sections'}获取第一个元素,并检查文本是否为"Date:",但这只是给我该元素.我要获取的文本位于其下,并且我唯一可以做的就是通过td搜索,但这不是我想要的,因为一个td中包含许多元素和文本./p>

我知道我可以使用正则表达式来做到这一点,但是我真的只是在尝试使用beautifulsoup来做到这一点.

预先感谢

解决方案

找到了它.

一旦获得元素<span class="sections">Date:</span> 我必须做element.nextSibling 比我想象的要容易.

I want to scrape a webpage, and I don't want to use regex at all. I am using beautifulsoup to handle the scraping. I have this source:

<TD WIDTH="50%" VALIGN="TOP"><span class="sections">Date:</span>
13 August 2014
      <br>&nbsp;<br><span class="sections">Application Deadline:</span>
     <font color="maroon">
      28 August  2014</font>

      <font color="#990066">Application closed / under review</font>

<br>&nbsp;<br><span class="sections">Duty Station:&nbsp;</span>
Multiple duty stations
<br>
&nbsp;

From this source, I want to scrape 13 August 2014.

I can find the span element searching by it's class with: soup.findAll('span',{'class':'sections'} get the first element, and check if the text is "Date:" but this is just giving me the element. The text that I'm trying to get is under it, and the only thing I can do is searching by the td but that's not what I want, because there are a lot of elements and text inside one td.

I know that I could do it using regex, but I'm really trying to do it just with beautifulsoup.

Thanks in advance

解决方案

Found it.

Once I get the element <span class="sections">Date:</span> I have to do element.nextSibling Easier than I thought.

这篇关于在已知元素beautifulsoup之外获取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆