解析bbcode的最佳方法 [英] Best way to parse bbcode

查看:550
本文介绍了解析bbcode的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想为php网站使用bbcode过滤器。 (我使用cakephp,它将是一个bbcode帮助)
我有一些要求。



Bbcodes可以嵌套。

  [block] 
[block]
[ / block]
[block]
[block]
[/ block]
[/ block]
[/ block]
< c>



Bbcodes可以有0个或更多参数



示例:

  [video:url =url,width =500,height =500视频] 

Bbcodes可能有多种行为。



假设 [url] text [/ url] 会转换为 [url:url =text ] text [/ url]
或视频bbcode将能够选择youtube,dailymotion ....



我认为它涵盖了我的大部分需求。我还是用正则表达式做了一些事情。但我最大的问题是匹配参数。事实上,我有嵌套的bbcode工作和bbcode与0参数。但是当我为参数添加一个正则表达式匹配,它没有正确匹配嵌套的bbcode。



\ [($ tag)(=。*)\\](。*)\ [\ / \1\] //它不是。*但非gready匹配器



我没有完整的正则表达式与我现在,但我有一个看起来像(上面)的东西。



所以有一种方法来有效地匹配bbcode regex或别的东西
我唯一能想到的是使用访问者模式,并用每种可能的标记这种方式分割我的文本,我可以有更多的控制我的文本解析,我可能可以验证我的文档,如果输入



我将使用sablecc来创建我的文本解析器
< a href =http://sablecc.org/> http://sablecc.org/



任何更好的想法或任何可能导致到一个高效的灵活的bbcode解析器?



谢谢你对我的英语不太好...

解决方案

这里有 pecl PEAR BBCode解析库。



如果这两个选项都不是一个选项,我将专注于将BBCode转换为有效的XML字符串,并且然后使用您最喜欢的XML解析例程。非常非常粗略的想法在这里,但


  1. 通过htmlspecialchars运行代码以逃避任何需要转义的实体

    li>
  2. 将所有[和]字符转换为<和>




如果BBCode被正确嵌套,你应该设置为将这个字符串传递给一个XML解析对象(SimpleXML,DOMDocument等)。


I'd like to work on a bbcode filter for a php website. (I'm using cakephp, it would be a bbcode helper) I have some requirement.

Bbcodes can be nested. So something like that is valid.

[block]  
    [block]  
    [/block]  
    [block]  
        [block]  
        [/block]  
    [/block]  
[/block]  

Bbcodes can have 0 or more parameters.

Exemple:

[video: url="url", width="500", height="500"]Title[/video]

Bbcodes might have mutliple behaviours.

Let say, [url]text[/url] would be transformed to [url:url="text"]text[/url] or the video bbcode would be able to choose between youtube, dailymotion....

I think it cover most of my needs. I alreay done something with regex. But my biggest problem was to match parameters. In fact, I got nested bbcode to work and bbcode with 0 parameters. But when I added a regex match for parameters it didn't match nested bbcode correctly.

"\[($tag)(=.*)\"\](.*)\[\/\1\]" // It wasn't .* but the non-gready matcher

I don't have the complete regex with me right now, But I had something that looked like that(above).

So is there a way to match bbcode efficiently with regex or something else. The only thing I can think of is to use the visitor pattern and to split my text with each possible tags this way, I can have a bit more of control over my text parsing and I could probably validate my document so if the input text doesn't have valid bbcode. I could Notify the user with a error before saving anything.

I would use sablecc to create my text parser. http://sablecc.org/

Any better idea? or anything that could lead to a efficient flexible bbcode parser?

Thank you and sorry for my bad english...

解决方案

There's both a pecl and PEAR BBCode parsing library. Software's hard enough without reinventing years of work on your own.

If neither of those are an option, I'd concentrate on turning the BBCode into a valid XML string, and then using your favorite XML parsing routine on that. Very very rough idea here, but

  1. Run the code through htmlspecialchars to escape any entities that need escaping

  2. Transform all [ and ] characters into < and > respectively

  3. Don't forget to account for the colon in cases like [tagname:

If the BBCode was nested properly, you should be all set to pass this string into an XML parsing object (SimpleXML, DOMDocument, etc.)

这篇关于解析bbcode的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆