从字符串中删除HTML的最好方法是什么？ [英] What's the best way to remove HTML from a string?

查看：162 发布时间：2016/12/15 14:00:52 regex coldfusion

本文介绍了从字符串中删除HTML的最好方法是什么？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我最近在ReReplace（）函数中开始使用下面的RegEx，使用ColdFusion从字符串中去除HTML标签。 请注意：我并未将此作为XSS或SQL注入的保护措施; 仅用于在HTML标题属性中显示之前从字符串中删除现有和安全的HTML。

  REReplaceNoCase（str，< [^>] *>，，ALL）

在半相关问题以包含空格和换行符。我被告知，为此目的使用RegEx是不合适的，

我强烈怀疑你所发布的正则表达式，实际上工作正常。我建议你不要使用正则表达式来解析HTML，因为HTML不是常规语言。改用HTML解析器。（标记位置）

如果这是真的，在显示之前从字符串中删除HTML的适当工具是什么？（请记住，HTML已经是安全的;它在进入数据库之前已经过处理。）

我知道 HTMLEditFormat（）和 HTMLCodeFormat（），但这两个函数不提供我需要的;较早的用特殊字符替换它们的HTML转义等价物，而后者完全相同，但也包装字符串a < pre> 标签。
$ b

我想要做的是在HTML标题属性中显示之前从HTML和换行符中删除一个字符串< a title =我的字符串没有HTML在这里> ...< / a>

有时候不需要HTML。比如说，你想显示一个没有HTML存储的帖子的摘录。

解决方案

我不同意推理你引用。虽然HTML不应该使用regexen进行解析，

但是你会想要更小心的只是< [^>] *> ，因为那将会变成

  < span title =>> ...< / span>

插入错误的

 > ...< / span>

需要类似<（[^>] |[^'] *|[^'] *'）*> 如果你喜欢一个正则表达式，你可以使用\\\（甚至使用交替组合它与上面的，但是这甚至更低效）。

I recently started using the following RegEx in a ReReplace() function to strip HTML tags from a string using ColdFusion. Please note: I am not using this as protection from XSS or SQL injection; this is only to remove existing and safe HTML from a string before it's displayed in an HTML title attribute.

REReplaceNoCase(str,"<[^>]*>","","ALL")

In a semi-related question I asked how to modify my RegEx to include spaces and line breaks. I was told that using RegEx for this purpose is not appropriate and this post was referenced as an explanation.

I strongly suspect though that the regular expressions you have posted don't in fact work correctly. I'd advise you not to use regular expressions to parse HTML as HTML is not a regular language. Use an HTML parser instead. (Mark Byers)

If this is true, what is the appropriate tool for removing HTML from a string before it's displayed? (Baring in mind the HTML is already safe; it's sanitized before entry to the DB).

I am aware of HTMLEditFormat() and HTMLCodeFormat(), but those two functions do not provide what I need; the earlier replaces special characters with their HTML-escaped equivalents, while the latter does exactly the same but also wraps the string a <pre> tag.

What I would like to do is clean a string from HTML and line breaks before I display in an HTML title attribute <a title="My string without HTML goes here">...</a>

There are times when the HTML is not necessary. Say you wanted to display an excerpt from a post without the HTML stored along with it, for instance.

解决方案

I disagree with the reasoning you quote. While HTML should not be parsed with regexen, stripping tags is perfect for them.

But you will want to be more careful than just <[^>]*>, since that would turn

<span title=">">...</span>

into the ill-formed

">...</span>

So you need something like <([^">]|"[^"]*"|'[^']*')*> instead. You can strip out line breaks with character replacement instead of a regex, but if you prefer a regex you can use something like \n (or even combine it with the above using alternation, but that's even less efficient).

这篇关于从字符串中删除HTML的最好方法是什么？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从字符串中删除HTML的最好方法是什么？ [英] What's the best way to remove HTML from a string?

问题描述

相关文章

高性能WEB开发最新文章

热门教程

热门工具

登录关闭

从字符串中删除HTML的最好方法是什么？ [英] What&#39;s the best way to remove HTML from a string?

问题描述

相关文章

高性能WEB开发最新文章

热门教程

热门工具

登录 关闭

从字符串中删除HTML的最好方法是什么？ [英] What's the best way to remove HTML from a string?

登录关闭