从字符串中删除 HTML 标签的正则表达式 [英] Regular expression to remove HTML tags from a string
问题描述
可能的重复:
删除HTML标签的正则表达式
是否有一个表达式可以获取两个 HTML 标签之间的值?
Is there an expression which will get the value between two HTML tags?
鉴于此:
<td class="played">0</td>
我正在寻找一个返回 I am looking for an expression which will return 您不应该尝试使用正则表达式解析 HTML.HTML 不是常规语言,因此您提出的任何正则表达式都可能在某些深奥的边缘情况下失败.请参阅对 这个问题的细节.虽然大部分是一个笑话,但它提出了一个很好的观点. You should not attempt to parse HTML with regex. HTML is not a regular language, so any regex you come up with will likely fail on some esoteric edge case. Please refer to the seminal answer to this question for specifics. While mostly formatted as a joke, it makes a very good point. 以下示例是 Java,但对于其他语言,正则表达式将是相似的(如果不完全相同). 假设您的非 html 不包含任何 <或 > 并且您的输入字符串结构正确. Assuming your non-html does not contain any < or > and that your input string is correctly structured. 如果你知道它们是一个特定的标签——例如你知道文本只包含 If you know they're a specific tag -- for example you know the text contains only Ωmega 在另一篇文章的评论中提出了一个好观点,即如果有多个标签,这将导致多个结果都被挤压在一起.
Ωmega brought up a good point in a comment on another post that this would result in multiple results all being squished together if there were multiple tags. 例如,如果输入字符串是 For example, if the input string were 在需要多个标签的情况下,我们可以这样做: In a situation where multiple tags are expected, we could do something like: 这会用一个空格替换 HTML,然后折叠空格,然后修剪末端的任何空格. This replaces the HTML with a single space, then collapses whitespace, and then trims any on the ends. 这篇关于从字符串中删除 HTML 标签的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!0
的表达式,去掉 标签.
0
, stripping the <td>
tags.推荐答案
String target = someString.replaceAll("<[^>]*>", "");
<td>
标签,你可以这样做:<td>
tags, you could do something like this:String target = someString.replaceAll("(?i)<td[^>]*>", "");
,那么上面的结果将是 Something Another Thing SomethingAnother Thing代码>.
<td>Something</td><td>Another Thing</td>
, then the above would result in SomethingAnother Thing
.String target = someString.replaceAll("(?i)<td[^>]*>", " ").replaceAll("\s+", " ").trim();
登录
关闭