如何从 ColdFusion 字符串中清除 HTML 标签? [英] How can I clean HTML tags out of a ColdFusion string?

查看:33
本文介绍了如何从 ColdFusion 字符串中清除 HTML 标签?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种从 ColdFusion 字符串中解析 HTML 标记的快速方法.我们正在提取一个 RSS 提要,其中可能包含任何内容.然后我们对信息进行一些操作,然后将其吐回另一个地方.目前我们正在使用正则表达式来执行此操作.有没有更好的方法来做到这一点?

I am looking for a quick way to parse HTML tags out of a ColdFusion string. We are pulling in an RSS feed, that could potentially have anything in it. We are then doing some manipulation of the information and then spitting it back out to another place. Currently we are doing this with a regular expression. Is there a better way to do this?

<cfloop from="1" to="#ArrayLen(myFeed.item)#" index="i">
  <cfset myFeed.item[i].description.value = 
   REReplaceNoCase(myFeed.item[i].description.value, '<(.|
)*?>', '', 'ALL')>
</cfloop>

我们正在使用 ColdFusion 8.

We are using ColdFusion 8.

推荐答案

免责声明 我强烈主张使用适当的解析器(而不是正则表达式)来解析 HTML.然而,这个问题不是关于解析 HTML,而是关于销毁它.对于超出此范围的所有任务,请使用解析器.

Disclaimer I am a fierce advocate of using a proper parser (instead of regex) to parse HTML. However, this question isn't about parsing HTML, but about destroying it. For all tasks that go beyond that, use a parser.

我认为你的正则表达式很好.只要从输入中删除所有 HTML 标记,使用像你这样的正则表达式是安全的.

I think your regex is good. As long as there is nothing more than removing all HTML tags from the input, using a regex like yours is safe.

其他任何事情都可能比它的价值更麻烦,但是您可以编写一个小函数,逐个字符地遍历字符串并删除标记括号内的所有内容 —例如:

Anything else would probably be more hassle than it's worth, but you could write a small function that loops through the string char-by-char once and removes everything that's within tag brackets — e.g.:

  • 一遇到<"字符就打开inTag"标志,
  • 一遇到>"就关掉
  • 只要标志关闭,就将字符复制到输出字符串
  • 为了提高性能,请使用 StringBuilder Java 对象而不是字符串连接
  • switch on a "inTag" flag as soon as you encounter a "<" character,
  • switch it off as soon as you encounter ">"
  • copy characters to the output string as long as the flag is off
  • for performance, use a StringBuilder Java object instead of string concatenation

对于应用程序的高需求部分,这可能比正则表达式更快.但是正则表达式很干净而且可能足够快.

For a high-demand part of your app, this may be faster than the regex. But the regex is clean and probably fast enough.

也许这个修改过的正则表达式对你有一些好处:

Maybe this modified regex has some advantages for you:

<[^>]*(?:>|$)

  • 在字符串末尾捕获未闭合的标签
  • [^>]* 优于 (.| )
    • catches unclosed tags at the end of the string
    • [^>]* is better than (.| )
    • 当模式中没有实际字母时,不需要使用 REReplaceNoCase().不区分大小写的正则表达式匹配比区分大小写要慢.

      The use of REReplaceNoCase() is unnecessary when there are no actual letters in the pattern. Case-insensitive regex matching is slower than doing it case-sensitively.

      这篇关于如何从 ColdFusion 字符串中清除 HTML 标签?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆