Ruby regex用于剥离BBCode [英] Ruby regex for stripping BBCode
问题描述
我正在尝试从给定的字符串中删除BBCode(仅将gsub
与某些正则表达式一起使用).
I'm trying to remove BBCode from a given string (just using gsub
with some regex).
这是一个示例字符串:
The [b]quick[/b] brown [url=http://example.com]fox[/url] jumps over the lazy dog [img=http://example.com/lazy_dog.png]
我需要输出的是:
The quick brown fox jumps over the lazy dog
那该怎么做呢?我已经找到了执行此操作的各种示例,但没有一个适用于我的用例.
So what's a way to do that? I've found various examples of doing this, but none have worked for my use case.
我尝试过的一个:/\[(\w+)[^w]*?](.*?)\[\/\1]/
但这不会捕捉到结尾的[img]
标签.
But that wouldn't catch the ending [img]
tag.
推荐答案
这篇文章的目的是说明如何解释BBCode,在保留剥离BBCode标记时应考虑这一点内容
这只会删除此页面所定义的BB代码标签.
This will only remove BB code tags as defined by this page.
不过,它可能删除的内容超过有效的BB代码标签.例如,[b ]Bold[/b]
不是用此BBCode测试器加粗,因此,按权利,这些标签应单独放置.但是[\b]
将被下面的正则表达式删除.它还会清楚地删除非BBCode,例如[\b=something]
It may remove more than what is considered valid BB code tag, though. For example, [b ]Bold[/b]
is not bolded by this BBCode tester, so by right, those tags should be left alone. But [\b]
will be removed by the regex below. It will also remove clearly non-BBCode such as [\b=something]
另一个示例是[url=http://example.com/ ][/url]
(注意空格).取决于BBCode解析器,这可能是确定的,还是不是确定的.下面的正则表达式会忽略开始标记,但会删除结束标记.
Another example is [url=http://example.com/ ][/url]
(note the space). This might be OK or not OK depending on the BBCode parser. The regex below ignores the opening tag, but removes the closing tag.
/\[\/?(?:b|u|i|s|size|color|center|quote|url|img|ul|ol|list|li|\*|code|table|tr|th|td|youtube|gvideo)(?:=[^\]\s]+)?\]/
正则表达式也无法正确处理[code]
标记,如此演示中所示.替换后,应仅将[code]
放在code
标记之间.
The [code]
tag is also not treated correctly by the regex as seen in this demo. The replacement should leave [code]
in between code
tag alone.
此 BBCode测试器允许将[b][b][b]Text[/b][/b][/b]
解析为加粗的Text
,但另一个将其解释为[b][b]Text[/b][/b]
,而部分[b][b]Text
用粗体显示,其余部分不用粗体显示.如果允许嵌套标签,则正则表达式不是一个好选择.
This BBCode tester allows [b][b][b]Text[/b][/b][/b]
to be parsed into Text
bolded, but the other one interpret it as [b][b]Text[/b][/b]
with the part [b][b]Text
bolded and the rest not bolded. If you allow nested tags, then regex is not a good choice.
这篇关于Ruby regex用于剥离BBCode的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!