Ruby regex用于剥离BBCode [英] Ruby regex for stripping BBCode

查看:84
本文介绍了Ruby regex用于剥离BBCode的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从给定的字符串中删除BBCode(仅将gsub与某些正则表达式一起使用).

I'm trying to remove BBCode from a given string (just using gsub with some regex).

这是一个示例字符串:

The [b]quick[/b] brown [url=http://example.com]fox[/url] jumps over the lazy dog [img=http://example.com/lazy_dog.png]

我需要输出的是:

The quick brown fox jumps over the lazy dog

那该怎么做呢?我已经找到了执行此操作的各种示例,但没有一个适用于我的用例.

So what's a way to do that? I've found various examples of doing this, but none have worked for my use case.

我尝试过的一个:/\[(\w+)[^w]*?](.*?)\[\/\1]/

但这不会捕捉到结尾的[img]标签.

But that wouldn't catch the ending [img] tag.

推荐答案

这篇文章的目的是说明如何解释BBCode,在保留剥离BBCode标记时应考虑这一点内容

这只会删除此页面所定义的BB代码标签.

This will only remove BB code tags as defined by this page.

不过,它可能删除的内容超过有效的BB代码标签.例如,[b ]Bold[/b] 不是此BBCode测试器加粗,因此,按权利,这些标签应单独放置.但是[\b]将被下面的正则表达式删除.它还会清楚地删除非BBCode,例如[\b=something]

It may remove more than what is considered valid BB code tag, though. For example, [b ]Bold[/b] is not bolded by this BBCode tester, so by right, those tags should be left alone. But [\b] will be removed by the regex below. It will also remove clearly non-BBCode such as [\b=something]

另一个示例是[url=http://example.com/ ][/url](注意空格).取决于BBCode解析器,这可能是确定的,还是不是确定的.下面的正则表达式会忽略开始标记,但会删除结束标记.

Another example is [url=http://example.com/ ][/url] (note the space). This might be OK or not OK depending on the BBCode parser. The regex below ignores the opening tag, but removes the closing tag.

/\[\/?(?:b|u|i|s|size|color|center|quote|url|img|ul|ol|list|li|\*|code|table|tr|th|td|youtube|gvideo)(?:=[^\]\s]+)?\]/

正则表达式也无法正确处理[code]标记,如此演示中所示.替换后,应仅将[code]放在code标记之间.

The [code] tag is also not treated correctly by the regex as seen in this demo. The replacement should leave [code] in between code tag alone.

BBCode测试器允许将[b][b][b]Text[/b][/b][/b]解析为加粗的Text ,但另一个将其解释为[b][b]Text[/b][/b],而部分[b][b]Text用粗体显示,其余部分不用粗体显示.如果允许嵌套标签,则正则表达式不是一个好选择.

This BBCode tester allows [b][b][b]Text[/b][/b][/b] to be parsed into Text bolded, but the other one interpret it as [b][b]Text[/b][/b] with the part [b][b]Text bolded and the rest not bolded. If you allow nested tags, then regex is not a good choice.

这篇关于Ruby regex用于剥离BBCode的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆