从字符串中删除 HTML 标签的正则表达式 [英] Regular expression to remove HTML tags from a string

查看:23
本文介绍了从字符串中删除 HTML 标签的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可能的重复:
删除HTML标签的正则表达式

是否有一个表达式可以获取两个 HTML 标签之间的值?

Is there an expression which will get the value between two HTML tags?

鉴于此:

<td class="played">0</td>

我正在寻找一个返回 0 的表达式,去掉 标签.

I am looking for an expression which will return 0, stripping the <td> tags.

推荐答案

您不应该尝试使用正则表达式解析 HTML.HTML 不是常规语言,因此您提出的任何正则表达式都可能在某些深奥的边缘情况下失败.请参阅对 这个问题的细节.虽然大部分是一个笑话,但它提出了一个很好的观点.

You should not attempt to parse HTML with regex. HTML is not a regular language, so any regex you come up with will likely fail on some esoteric edge case. Please refer to the seminal answer to this question for specifics. While mostly formatted as a joke, it makes a very good point.

以下示例是 Java,但对于其他语言,正则表达式将是相似的(如果不完全相同).

String target = someString.replaceAll("<[^>]*>", "");

假设您的非 html 不包含任何 <或 > 并且您的输入字符串结构正确.

Assuming your non-html does not contain any < or > and that your input string is correctly structured.

如果你知道它们是一个特定的标签——例如你知道文本只包含 <td> 标签,你可以这样做:

If you know they're a specific tag -- for example you know the text contains only <td> tags, you could do something like this:

String target = someString.replaceAll("(?i)<td[^>]*>", "");

Ωmega 在另一篇文章的评论中提出了一个好观点,即如果有多个标签,这将导致多个结果都被挤压在一起.

Ωmega brought up a good point in a comment on another post that this would result in multiple results all being squished together if there were multiple tags.

例如,如果输入字符串是 SomethingAnother Thing,那么上面的结果将是 SomethingAnother Thing.

For example, if the input string were <td>Something</td><td>Another Thing</td>, then the above would result in SomethingAnother Thing.

在需要多个标签的情况下,我们可以这样做:

In a situation where multiple tags are expected, we could do something like:

String target = someString.replaceAll("(?i)<td[^>]*>", " ").replaceAll("\s+", " ").trim();

这会用一个空格替换 HTML,然后折叠空格,然后修剪末端的任何空格.

This replaces the HTML with a single space, then collapses whitespace, and then trims any on the ends.

这篇关于从字符串中删除 HTML 标签的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆