使用BeautifulSoup/Python从html文件中提取文本 [英] Extract text from html file with BeautifulSoup/Python

查看：103 发布时间：2021/4/15 19:13:14 python html beautifulsoup

本文介绍了使用BeautifulSoup/Python从html文件中提取文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试从html文件中提取文本. html 文件如下所示:

I am trying to extract the text from a html file. The html file looks like this:

<li class="toclevel-1 tocsection-1">
    <a href="#Baden-Württemberg"><span class="tocnumber">1</span>
        <span class="toctext">Baden-Württemberg</span>
    </a>
</li>
<li class="toclevel-1 tocsection-2">
    <a href="#Bayern">
        <span class="tocnumber">2</span>
        <span class="toctext">Bayern</span>
    </a>
</li>
<li class="toclevel-1 tocsection-3">
    <a href="#Berlin">
        <span class="tocnumber">3</span>
        <span class="toctext">Berlin</span>
    </a>
</li>

我想从最后一个 span 标记中提取最后一个文本.在第一行中，它是 class ="toctext" 之后的Baden-Würtemberg"，然后将其放入python列表中.

I want to extract the last text from the last spantag. In the first line it would be "Baden-Würtemberg" after class="toctext"and then put it to a python list.

在Python中，我尝试了以下操作:

in Python I tried the following:

names = soup.find_all("span",{"class":"toctext"})

我的输出是这个列表:

[<span class="toctext">Baden-Württemberg</span>, <span class="toctext">Bayern</span>, <span class="toctext">Berlin</span>]

那我怎么只提取标签之间的文本呢?

So how can I extract only the text between the tags?

感谢所有人

推荐答案

find_all 方法返回一个列表.遍历列表以获取文本.

The find_all method returns a list. Iterate over the list to get the text.

for name in names:
    print(name.text)

Baden-Württemberg
Bayern
Berlin

内置的python dir()和 type()方法总是很方便地检查对象.

The builtin python dir() and type() methods are always handy to inspect an object.

print(dir(names))

[...,
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort',
 'source']

这篇关于使用BeautifulSoup/Python从html文件中提取文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用BeautifulSoup/Python从html文件中提取文本 [英] Extract text from html file with BeautifulSoup/Python

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

使用BeautifulSoup/Python从html文件中提取文本 [英] Extract text from html file with BeautifulSoup/Python

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭