正则表达式匹配关闭HTML标签 [英] Regular expression to match closing HTML tags

查看：129 发布时间：2018/6/15 13:19:11 python html regex

本文介绍了正则表达式匹配关闭HTML标签的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在研究一个小Python脚本来清理HTML文档。它的工作方式是接受KEEP标签列表，然后通过不在列表中的HTML代码垃圾标签解析我一直使用正则表达式来执行此操作，并且我已经能够匹配开始标签和自闭合标签但没有关闭标签。我一直在试验匹配结束标签的模式是< /（？！a）> 。这对我来说似乎合乎逻辑，所以为什么不工作？ （?! a）应该匹配任何不是锚定标记的东西（不是说a可以是任何东西 - 这仅仅是一个例子）。 p>

编辑：AGG！

解决方案

阅读：
- 正则表达式匹配除XHTML自包含标记之外的开放标记

Repent。

p>使用真正的HTML解析器，例如 BeautifulSoup 。

I'm working on a small Python script to clean up HTML documents. It works by accepting a list of tags to KEEP and then parsing through the HTML code trashing tags that are not in the list I've been using regular expressions to do it and I've been able to match opening tags and self-closing tags but not closing tags. The pattern I've been experimenting with to match closing tags is </(?!a)>. This seems logical to me so why is not working? The (?!a) should match on anything that is NOT an anchor tag (not that the "a" is can be anything-- it's just an example).

Edit: AGG! I guess the regex didn't show!
解决方案

Read:

RegEx match open tags except XHTML self-contained tags

Can you provide some examples of why it is hard to parse XML and HTML with a regex?

Repent.

Use a real HTML parser, like BeautifulSoup.

这篇关于正则表达式匹配关闭HTML标签的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

正则表达式匹配关闭HTML标签 [英] Regular expression to match closing HTML tags

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

正则表达式匹配关闭HTML标签 [英] Regular expression to match closing HTML tags

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭