在 之前获取文本python/bs4 [英] Get text before python/bs4

查看：61 发布时间：2020/9/20 7:11:42 python html beautifulsoup

本文介绍了在 之前获取文本python/bs4的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试从一个网页上抓取一些数据.标签文本中包含换行符和 标签.我只想在标签的开头获得电话号码.您能给我一个建议如何只获取号码吗?

I'm trying to scrape some data from one web page. There are newlines and   tags in the tag text. I want to get only the telephone number on the beginning of the tag. Will you give me an advice how to get only the number?

这是HTML代码:

<td>
    +421 48/471 78 14



    <br />
    <em>(bowling)</em>
</td>

beautifulsoup中是否有一种方法可以在标签中获取文本，但只能获取文本，而该文本不会被其他标签包围?第二件事:摆脱文本换行符和html换行符?

Is there a way in beautifulsoup to get a text in a tag, but only that text, which is not surrounded by other tags? And the second thing: to get rid of text newlines and html newlines?

我使用BS4.

输出为:'+421 48/471 78 14'

The output would be: '+421 48/471 78 14'

您有什么想法吗? 谢谢

Have you any ideas? Thank you

推荐答案

html="""
<td>
    +421 48/471 78 14



    <br />
    <em>(bowling)</em>
</td>
"""


from bs4 import BeautifulSoup

soup = BeautifulSoup(html)

print soup.find("td").contents[0].strip() 
+421 48/471 78 14

print soup.find("td").next_element.strip()
+421 48/471 78 14

soup.find("td").contents[0].strip()查找tag的内容，我们将得到tag的第一个元素，并使用str.strip()删除所有\n换行符.

soup.find("td").contents[0].strip() finds the contents of the tag which we get the first element of and remove all the \n newline chars with str.strip().

从文档 next_element :

字符串或标签的.next_element属性指向之后立即解析的内容

这篇关于在 之前获取文本python/bs4的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在< br/>之前获取文本python/bs4 [英] Get text before <br/> python/bs4

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

在&lt; br/&gt;之前获取文本python/bs4 [英] Get text before &lt;br/&gt; python/bs4

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

在< br/>之前获取文本python/bs4 [英] Get text before <br/> python/bs4

登录关闭