如何在保留html标签/结构的同时在html中查找/替换文本 [英] How to find/replace text in html while preserving html tags/structure

查看：161 发布时间：2018/6/13 16:28:15 python html html-parsing

本文介绍了如何在保留html标签/结构的同时在html中查找/替换文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用正则表达式来转换文本，但我想保留HTML标记。
例如如果我想用堆栈下溢代替堆栈溢出，这应该如
期望的那样工作：如果输入是堆栈< sometag> overflow< / sometag> ，我必须得到 stack< sometag> underflow< / sometag> （即字符串替换已完成，但
标签仍然存在...

解决方案

在处理HTML时，使用DOM库而不是正则表达式：

lxml：解析器，文档和HTML序列化程序，也可以使用BeautifulSoup和html5lib进行解析。和HTML序列化程序。
html5lib：一个解析器，它有一个序列化程序。

ElementTree：一个文档对象和XML序列化程序。 b $ b
cElementTree：作为C扩展实现的文档对象。
HTMLParser：解析器。

Genshi：包含解析器，文档和HTML序列化程序。

xml.dom.minidom：文档构建到标准库中的模型模型，html5lib可以解析该模型。

从 http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/ 。

其中，我会推荐lxml，html5lib和BeautifulSoup。

I use regexps to transform text as I want, but I want to preserve the HTML tags. e.g. if I want to replace "stack overflow" with "stack underflow", this should work as expected: if the input is stack <sometag>overflow</sometag>, I must obtain stack <sometag>underflow</sometag> (i.e. the string substitution is done, but the tags are still there...

解决方案

Use a DOM library, not regular expressions, when dealing with manipulating HTML:

lxml: a parser, document, and HTML serializer. Also can use BeautifulSoup and html5lib for parsing.
BeautifulSoup: a parser, document, and HTML serializer.
html5lib: a parser. It has a serializer.
ElementTree: a document object, and XML serializer
cElementTree: a document object implemented as a C extension.
HTMLParser: a parser.
Genshi: includes a parser, document, and HTML serializer.
xml.dom.minidom: a document model built into the standard library, which html5lib can parse to.

Stolen from http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/.

Out of these I would recommend lxml, html5lib, and BeautifulSoup.

这篇关于如何在保留html标签/结构的同时在html中查找/替换文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在保留html标签/结构的同时在html中查找/替换文本 [英] How to find/replace text in html while preserving html tags/structure

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

如何在保留html标签/结构的同时在html中查找/替换文本 [英] How to find/replace text in html while preserving html tags/structure

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭