phpbb BBCode到HTML（正则表达式或其他） [英] phpbb BBCode to HTML (regex or otherwise)

查看：216 发布时间：2018/6/25 16:59:12 html regex wordpress bbcode phpbb

本文介绍了phpbb BBCode到HTML（正则表达式或其他）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在将内容从phpBB迁移到WordPress。我已经完成了将bbcode转换为html的过程。

BBCode由注入每个标签的字母数字字符串复杂化。

一个常见的文章将包含像这样的文本...

  [url = url]点击[/ url：583ow9wo] 
 
 [bg：583ow9wo] jpg [/ img：583ow9wo]

我对Regular Expressions没有经验，但相信这可能是一条出路，因为我从以下文章中找到了一些帮助 https://stackoverflow.com/a/5505874/4356865 （使用正则表达式[/？b：\ d {5 }]），但是这个实例中的正则表达式只会从这个例子中删除数字字符。

任何帮助表示赞赏。

解决方案

像这样的东西可以用于没有属性的标签：

  \\ \\ [（b | I | U）？（：[α-Z0-9] +）\（？*）\ [\ / \1（？：\2）？\] 
 
 \ [ - 匹配文字[
（b | i | u） - 匹配b，i ，或u，作为后向引用1 
（：[a-z0-9] +）？ - 匹配冒号和字母数字字符串，捕获为反向引用2 
  - 问号允许：字符串不存在。 
 \]  - 匹配文字]
（。*？） - 匹配任何*，只要完成匹配所需的次数，创建反向引用3. 
 \ [ - 匹配文字[
 \ /  - 匹配文字/
 \ 1  - 调用backreference 1以确保打开/关闭标签匹配
（？：\\ \\2）？ - 调用反向引用2来进一步确保它是相同的标记
 \]  - 匹配文字]

匹配像url这样的标签很容易

使用具有属性的标签，它们对属性做了不同的事情，所以它可能更容易处理像一个标签像URL独立从像IMG标签。

  \ [（url）（?: \s * = \\ \\ *（。*？））？（：[a-z0-9] +）\]（。*？）\ [\ / \ 1（？：\3）？\] 
 
 \ [ - 匹配文字[
（url） - 与括号中的文字url相匹配，因此我们可以稍后调用反向引用1，更容易修改
（？： - ？：表示非捕获组，因此它创建一个组而不创建反向引用或更改反向引用计数。
 \s * = \s *  - 匹配文字= ，填充任何一边的任何数量的空白
（。*？） - 匹配任何字符，尽可能少地完成匹配，创建反向引用2 
） - 关闭非捕获组
（：[a-z0-9] +） - 匹配字母数字字符串作为反向引用3. 
 \]  - 匹配文字]
（。*？） - 匹配任何字符尽可能少地完成匹配，反向引用4 
 \ [ - 匹配文字[
 \ /  - 匹配文字/
 \ 1  - 调用反向引用1 
（？：\3）？ - 调用反向引用3 
 \]  - 匹配字面值[

替换，标签的内容本身就是反向引用，所以你可以为b / i / u标签做这样的事情。

 < \1> \3< / \1>

对于url标记，就像这样

 < A href =\2> \ 4< / A>

我说dot / period匹配多个地方的任何字符。它匹配除换行符以外的任何字符。你可以通过使用dotall修饰符 s 像这样在你的正则表达式中打开换行符修饰符。

  /（。*）< foo> / s

I'm in the process of migrating content from phpBB to WordPress. I have suceeded up to the point of translating the bbcode into html.



The BBCode is complicated by an alphanumeric string that is injected into each tag.

A common post will contain text like so...
[url=url] Click here [/url:583ow9wo]

[b:583ow9wo] BOLD [/b:583ow9wo]

[img:583ow9wo] jpg [/img:583ow9wo]
I am inexperienced with Regular Expressions but believe this may be a way out, as I found some help from the following post https://stackoverflow.com/a/5505874/4356865 (use regex [/?b:\d{5}] ) but the regex in this instance will only remove the numeric characters from this example.

Any help appreciated.
 解决方案 
Something like this will work for tags that have no attributes:
\[(b|i|u)(:[a-z0-9]+)?\](.*?)\[\/\1(?:\2)?\]

\[               -- matches literal "[" 
  (b|i|u)        -- matches b, i, or u, captures as backreference 1
  (:[a-z0-9]+)?  -- matches colon and then alphanumeric string, captures as backreference 2
                 -- the question mark allows the :string not to be present.
\]               -- matches literal "]"
(.*?)            -- matches anything*, as few times as required to finish the match, creates backreference 3.
\[               -- matches literal "["
  \/             -- matches literal "/"
  \1             -- invokes backreference 1 to make sure the opening/closing tags match
  (?:\2)?        -- invokes backreference 2 to further make sure it's the same tag
\]               -- matches literal "]"
Matching a tag like url is easy enough

With tags that have attributes, they do different things with their attributes, and so it's probably easier to handle a tag like URL seperately from a tag like IMG.
\[(url)(?:\s*=\s*(.*?))?(:[a-z0-9]+)\](.*?)\[\/\1(?:\3)?\]

\[                    -- matches literal "["
  (url)               -- matches literal "url", in parentheses so we can invoke backreference 1 later, easier for you to modify
  (?:                 -- ?: signifies a non-capturing group, so it creates a group without creating a backreference, or altering the backreference count.
    \s*=\s*           -- matches literal "=", padded by any amount of whitespace on either side
    (.*?)             -- matches any character, as few times as possible, to complete the match, creates backreference 2
  )                   -- closes the noncapturing group
  (:[a-z0-9]+)        -- matches the alphanumeric string as backreference 3.
\]                    -- matches literal "]"
(.*?)                 -- matches any character as few times as possible to complete the match, backreference 4
\[                    -- matches literal "["
  \/                  -- matches literal "/"
  \1                  -- invokes backreference 1
  (?:\3)?             -- invokes backreference 3
\]                    -- matches literal "["
For your replacing, the contents of the tags are in backreferences themselves so you can do something like this for the b/i/u tags.
<\1>\3</\1>
For the url tag, it's something like this
<A href="\2">\4</A>
I say that the dot/period matches any character in multiple places. It matches any character except newline. You can turn the newline modifier in your regex on by using the "dotall" modifier s like this
/(.*)<foo>/s


                        
这篇关于phpbb BBCode到HTML（正则表达式或其他）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

phpbb BBCode到HTML（正则表达式或其他） [英] phpbb BBCode to HTML (regex or otherwise)

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

phpbb BBCode到HTML（正则表达式或其他） [英] phpbb BBCode to HTML (regex or otherwise)

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭