在< br/>之前获取文本python/bs4 [英] Get text before <br/> python/bs4
问题描述
我正在尝试从一个网页上抓取一些数据.标签文本中包含换行符和<br/>
标签.我只想在标签的开头获得电话号码.您能给我一个建议如何只获取号码吗?
I'm trying to scrape some data from one web page. There are newlines and <br/>
tags in the tag text. I want to get only the telephone number on the beginning of the tag. Will you give me an advice how to get only the number?
这是HTML代码:
<td>
+421 48/471 78 14
<br />
<em>(bowling)</em>
</td>
beautifulsoup中是否有一种方法可以在标签中获取文本,但只能获取文本,而该文本不会被其他标签包围?第二件事:摆脱文本换行符和html换行符?
Is there a way in beautifulsoup to get a text in a tag, but only that text, which is not surrounded by other tags? And the second thing: to get rid of text newlines and html newlines?
我使用BS4.
输出为:'+421 48/471 78 14'
The output would be: '+421 48/471 78 14'
您有什么想法吗? 谢谢
Have you any ideas? Thank you
推荐答案
html="""
<td>
+421 48/471 78 14
<br />
<em>(bowling)</em>
</td>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
print soup.find("td").contents[0].strip()
+421 48/471 78 14
print soup.find("td").next_element.strip()
+421 48/471 78 14
soup.find("td").contents[0].strip()
查找tag
的内容,我们将得到tag
的第一个元素,并使用str.strip()
删除所有\n
换行符.
soup.find("td").contents[0].strip()
finds the contents of the tag
which we get the first element of and remove all the \n
newline chars with str.strip()
.
从文档 next_element :
字符串或标签的.next_element属性指向之后立即解析的内容
这篇关于在< br/>之前获取文本python/bs4的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!