从标签之间提取文本的有效方法 [英] Efficient way to extract text from between tags

查看：44 发布时间：2021/5/6 19:54:29 python regex extract

本文介绍了从标签之间提取文本的有效方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有这样的东西:

var = '<li> <a href="/...html">Energy</a>
      <ul>
      <li> <a href="/...html">Coal</a> </li>
      <li> <a href="/...html">Oil </a> </li>
      <li> <a href="/...html">Carbon</a> </li>
      <li> <a href="/...html">Oxygen</a> </li'

在标记之间提取文本的最佳(最有效)方法是什么?我应该为此使用正则表达式吗?我当前的技术依赖于在 li 标记上拆分字符串并使用 for 循环，只是想知道是否有更快的方法来实现此目的.

What is the best (most efficient) way to extract the text in between the tags? Should I use regex for this? My current technique relies on splitting the string on li tags and using a for loop, just wondering if there was a faster way to do this.

推荐答案

您可以使用美丽的汤对于这种任务非常好.它非常简单，易于安装且带有大量文档.

You can use Beautiful Soup that is very good for this kind of task. It is very straightforward, easy to install and with a large documentation.

您的示例中的某些li标签未关闭.我已经进行了更正，这就是如何获取所有li标签的方法

Your example has some li tags not closed. I already made the corrections and this is how would be to get all the li tags

from bs4 import BeautifulSoup

var = '''<li> <a href="/...html">Energy</a></li>
    <ul>
    <li><a href="/...html">Coal</a></li>
    <li><a href="/...html">Oil </a></li>
    <li><a href="/...html">Carbon</a></li>
    <li><a href="/...html">Oxygen</a></li>'''

soup = BeautifulSoup(var)

for a in soup.find_all('a'):
  print a.string

它将打印:

能源
可可
石油
碳
氧气

Energy
Coa
Oil
Carbon
Oxygen

有关文档和更多示例，请参见BeautifulSoup doc

For documentation and more examples see the BeautifulSoup doc

这篇关于从标签之间提取文本的有效方法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从标签之间提取文本的有效方法 [英] Efficient way to extract text from between tags

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从标签之间提取文本的有效方法 [英] Efficient way to extract text from between tags

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭