BS4:在标签中获取文本 [英] BS4: Getting text in tag

查看:984
本文介绍了BS4:在标签中获取文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用美丽的汤.有这样的标签:

I'm using beautiful soup. There is a tag like this:

<li><a href="example"> s.r.o., <small>small</small></a></li>

我只想获取锚点<a>标记内的文本,而输出中的<small>标记则不包含任何文本;即" s.r.o., "

I want to get the text within the anchor <a> tag only, without any from the <small> tag in the output; i.e. " s.r.o., "

我尝试了find('li').text[0],但是它不起作用.

I tried find('li').text[0] but it does not work.

BS4中是否有可以执行此操作的命令?

Is there a command in BS4 which can do that?

推荐答案

一种选择是从

One option would be to get the first element from the contents of the a element:

>>> from bs4 import BeautifulSoup
>>> data = '<li><a href="example"> s.r.o., <small>small</small></a></li>'
>>> soup = BeautifulSoup(data)
>>> print soup.find('a').contents[0]
 s.r.o., 

另一种方法是找到small标记并获取

Another one would be to find the small tag and get the previous sibling:

>>> print soup.find('small').previous_sibling
 s.r.o., 


好吧,还有各种各样的选择/疯狂选择:


Well, there are all sorts of alternative/crazy options also:

>>> print next(soup.find('a').descendants)
 s.r.o., 
>>> print next(iter(soup.find('a')))
 s.r.o., 

这篇关于BS4:在标签中获取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆