Mirc控制代码到html,通过php [英] Mirc control codes to html, through php

查看:94
本文介绍了Mirc控制代码到html,通过php的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道这已经被问过了,在这个论坛上也不少,但提出的解决方案对我来说并不可靠。



我一直在为此工作现在一个星期或更长时间了,我昨天一直坚持到凌晨3点才开始工作......但我离题了,让我来谈谈这个问题:

那些不知道的,mirc使用ascii控制代码来控制字符颜色,下划线,重量和斜体。该颜色的ascii代码是3,粗体2,下划线1F,斜体1D和反转(黑色背景上的白色文本),16。



作为形式这个数据会进来,我们有(在正则表达式中,因为这些字符不会打印):

  \x034这是文本is red \x033 this text is green\x03 \x02bold text\x02 
\x034.3这个文本是红色的,带有绿色背景\x03

Et-cetera。

以下是我尝试修改自己使用的两个函数,但已经返回了不可靠的结果。在进入代码之前,为了具体说明'不可靠',有时代码会解析,其他时候仍然会有控制代码留在文本中,我不知道为什么。无论如何;
$ b

function mirc2html($ x){
$ c = array( FFF, 000, 00007F, 009000, FF0000, 7F0000, 9F009F, FF7F00, FFFF00, 00F800, 00908F, 00FFFF, 0000FF FF00FF, 7F7F7F, CFD0CF);
$ x = preg_replace(/ \ 0x02(。*?)((?= \ 0x02)\ 0x02 | $)/,< b> $ 1< / b>,$ X);
$ x = preg_replace(/ \ x1F(。*?)((?= \ x1F)\x1F | $)/,< u> $ 1",$ X);
$ x = preg_replace(/ \ x1D(。*?)((?= \ x1D)\x1D | $)/,i $ 1 / $ X);
$ x = preg_replace(/ \x03(\d\d?),(\d\d?)(。*?)(?(?= \x03)| $) / e,'< / span>< span style = \color:#'。\ $ c [$ 1]。'; background-color:#'。\ $ c [$ 2]。' ; \> $ 3< / span>',$ x);
$ x = preg_replace(/ \x03(\d\d?)(。*?)(?(?= \x03)| $)/ e,'< / span> ;< span style = \color:#'。\ $ c [$ 1]。'; \> $ 2< / span>',$ x);
// $ x = preg_replace(/(\x0F | \x03)(。*?)/,< span style = \color:#000; background-color:#FFF ; \> $ 2< / span>,$ x);
// $ x = preg_replace(/ \ x16(。*?)/,< span style = \color:#FFF; background-color:#000; \> $ 1< / span>,$ x);
// $ x = preg_replace(/ \< \ / span \> /,,$ x,1);
// $ x = preg_replace(/(\< \ / span \>){2} /,< / span>,$ x);
返回$ x;
}

函数color_rep($ matches){
$ matches [2] = ltrim($ matches [2],0);
$ bindings = array(0 =>'white',1 =>'black',2 =>'blue',3 =>'green',4 =>'red' ='棕色',6 =>'紫色',7 =>'橙色',8 =>'黄色',9 =>'绿色',10 ='#00908F',
11 =>'lightblue',12 =>'blue',13 =>'pink',14 =>'gray',15 =>'lightgrey');
$ preg = preg_match_all('/(\ d \ d?),(\d\d?)/',$ matches [2],$ col_arr);
// print_r($ col_arr);
$ fg = isset($ bindings [$ matches [2]])? $ bindings [$ matches [2]]:'transparent';
if($ preg == 1){
$ fg = $ bindings [$ col_arr [1] [0]];
$ bg = $ bindings [$ col_arr [2] [0]];
}
else {
$ bg ='transparent';
}


return'< span style =color:'。$ fg。'; background:'。$ bg。';>'。$ matches [3]。 '< /跨度>';
}

而且,如果相关,代码被调用:

  $ logln = preg_replace_callback(/(\x03)(\d\d ?, \d\d? | \d\d)(\s *)((= \x03????)| $)/, color_rep,$ logln)?; 

来源:第一个第二



我当然也试图看看各种基于php / ajax的irc客户端所做的方法,并且还没有取得任何成功。至于做这个mirc端,我也在那里查看,虽然结果比php更可靠,但发送到服务器的数据会以指数方式增长,直到套接字超出上传时间,所以它不是'这是一个可行的选择。



与往常一样,在这件事情上的任何帮助将不胜感激。 解决方案

<你应该分解问题,例如用分词器。标记器将扫描输入字符串并将特殊部分转换为命名标记,所以脚本的其余部分可以标识它们。用法示例:

  $ mirc =\x034this text is red\x033this text is green\x03 \x02bold text \x02 
\x034,3这个文本是红色的,带有绿色背景\x03;

$ tokenizer = new Tokenizer($ mirc);

while(list($ token,$ data)= $ tokenizer-> getNext())
{
switch($ token)
{
'color-fgbg':
printf('<%s:%d,%d>',$ token,$ data [1],$ data [2]);
休息;

case'color-fg':
printf('<%s:%d>',$ token,$ data [1]);
休息;

case'color-reset':
case'style-bold';
printf('<%s>',$ token);
休息;

case'catch-all':
echo $ data [0];
休息;

默认值:
抛出新的异常(sprintf('Unknown token<%s> ;.',$ token));
}
}

这还不算太多,但要确定有趣的部分和他们的(子)值作为输出结果:

 < color-fg:4>此文本是红色< color -fg:3>此文字为绿色< color-reset> < style-bold>粗体文本< style-bold> 
< color-fgbg:4,3>此文字是红色的,并带有绿色背景< color-reset>

您应该相对容易地修改上面的循环并处理状态,如打开/关闭颜色和字体变量标签,如粗体。



标记器本身定义了一组标记,它们试图在某个偏移处依次找到它们(从字符串的开头)。令牌由正则表达式定义:

$ $ p $ $ $ $ $ $ $ $ $ $ $ $ $ * $ * $ *基于正则表达式的标记符
*第一个令牌获胜。
* /
class Tokenizer
{
private $ subject;
private $ offset = 0;
private $ tokens = array(
'color-fgbg'=&'; \x03(\d {1,2}),(\d {1,2})',
'color-fg'=>'\x03(\d {1,2})',
'color-reset'=>'\x03',
' style-bold'=>'\x02',
'catch-all'=>'。| \\\
',
);
public function __construct($ subject)
{
$ this-> subject =(string)$ subject;




$ b $ p
$ b

这个私有数组显示简单的正则表达式他们用钥匙得到一个名字。这是上面的开关语句中使用的名称。
$ b

next()函数将在当前偏移量处查找一个令牌,如果找到它,抵消并返回令牌。所有小组匹配。当涉及到偏移时,由于主例程通常不需要知道偏移量,所以更加详细的 $ matches 数组被简化(去掉偏移量)。



原则很简单:第一种模式获胜。所以你需要把最匹配的模式(以字符串长度的意思)放在最上面以使其工作。在你的情况下,最大的是前景和背景颜色的标记,< color-fgbg>

如果没有找到标记,则返回 NULL ,所以这里 next()函数:

  ... 
/ **
* @return array | null
* /
public function getNext()
{
if($ this-> offset> = strlen($ this-> subject))
return NULL;

foreach($ this-> tokens as $ name => $ token)
{
if(FALSE === $ r = preg_match(〜$ token〜 ,$ this-> subject,$ matches,PREG_OFFSET_CAPTURE,$ this-> offset))
throw new RuntimeException('Pattern for token%s failed(regex error)。',$ name);
if($ r === 0)
continue;
if(!isset($ matches [0])){
var_dump(substr($ this-> subject,$ this-> offset));
$ c = 1;
}
if($ matches [0] [1]!== $ this-> offset)
continue;
$ data = array();
foreach($ match为$ match)
{
list($ data [])= $ match;
}

$ this-> offset + = strlen($ data [0]);
返回数组($ name,$ data);
}
返回NULL;


code


$ b因此,字符串的标记化现在被封装到 Tokenizer 类中,并且令牌的解析是您可以在应用程序的某个其他部分内部完成的东西。这会让你更容易改变样式的方式(HTML输出,基于CSS的HTML输出或类似bbcode或markdown的东西),而且还可以支持未来的新代码。此外,万一缺少某些东西,您可以更轻松地解决问题,因为它可能是一个未被识别的代码,或者是转换时缺少的东西。



完整示例为gist: Mirc颜色和样式(粗体)代码的Tokenizer示例。



相关资源:


I realize this has been asked before, on this very forum no less, but the proposed solution was not reliable for me.

I have been working on this for a week or more by now, and I stayed up 'till 3am yesterday working on it... But I digress, let me get to the issue at hand:

For those unaware, mirc uses ascii control codes to control character color, underline, weight, and italics. The ascii code for the color is 3, bold 2, underline 1F, italic 1D, and reverse(white text on black background), 16.

As an example of the form this data is going to come in, we have(in regex because those characters will not print):

\x034this text is red\x033this text is green\x03 \x02bold text\x02
\x034,3this text is red with a green background\x03

Et-cetera.

Below are the two functions I have attempted to modify for my own use, but have returned unreliable results. Before I get into that code, to be specific on 'unreliable', sometimes the code would parse, other times there would still be control codes left in the text, and I can't figure out why. Anyway;

function mirc2html($x) {
    $c = array("FFF","000","00007F","009000","FF0000","7F0000","9F009F","FF7F00","FFFF00","00F800","00908F","00FFFF","0000FF","FF00FF","7F7F7F","CFD0CF");
    $x = preg_replace("/\x02(.*?)((?=\x02)\x02|$)/", "<b>$1</b>", $x);
    $x = preg_replace("/\x1F(.*?)((?=\x1F)\x1F|$)/", "<u>$1</u>", $x);
    $x = preg_replace("/\x1D(.*?)((?=\x1D)\x1D|$)/", "<i>$1</i>", $x);
    $x = preg_replace("/\x03(\d\d?),(\d\d?)(.*?)(?(?=\x03)|$)/e", "'</span><span style=\"color: #'.\$c[$1].'; background-color: #'.\$c[$2].';\">$3</span>'", $x);
    $x = preg_replace("/\x03(\d\d?)(.*?)(?(?=\x03)|$)/e", "'</span><span style=\"color: #'.\$c[$1].';\">$2</span>'", $x);
    //$x = preg_replace("/(\x0F|\x03)(.*?)/", "<span style=\"color: #000; background-color: #FFF;\">$2</span>", $x);
    //$x = preg_replace("/\x16(.*?)/", "<span style=\"color: #FFF; background-color: #000;\">$1</span>", $x);
    //$x = preg_replace("/\<\/span\>/","",$x,1);
    //$x = preg_replace("/(\<\/span\>){2}/","</span>",$x);
    return $x;
}

function color_rep($matches) {
    $matches[2] = ltrim($matches[2], "0");
    $bindings = array(0=>'white',1=>'black',2=>'blue',3=>'green',4=>'red',5=>'brown',6=>'purple',7=>'orange',8=>'yellow',9=>'lightgreen',10=>'#00908F',
        11=>'lightblue',12=>'blue',13=>'pink',14=>'grey',15=>'lightgrey');
    $preg = preg_match_all('/(\d\d?),(\d\d?)/',$matches[2], $col_arr);
    //print_r($col_arr);
    $fg = isset($bindings[$matches[2]]) ? $bindings[$matches[2]] : 'transparent';
    if ($preg == 1) {
        $fg = $bindings[$col_arr[1][0]];
        $bg = $bindings[$col_arr[2][0]];
    }
    else {
        $bg = 'transparent';
    }


    return '<span style="color: '.$fg.'; background: '.$bg.';">'.$matches[3].'</span>';
}

And, in case it is relevant, where the code is called:

$logln = preg_replace_callback("/(\x03)(\d\d?,\d\d?|\d\d?)(\s?.*?)(?(?=\x03)|$)/","color_rep",$logln);

Sources: First, Second

I've of course also attempted to look at the methods done by various php/ajax based irc clients, and there hasn't been any success there. As to doing this mirc-side, I've looked there as well, and although the results have been more reliable than php, the data sent to the server increases exponentially to the point that the socket times out on upload, so it isn't a viable option.

As always, any help in this matter would be appreciated.

解决方案

You should divide the problem, for example with a tokenizer. A tokenizer will scan the input string and turn the special parts into named tokens, so the rest of your script can identify them. Usage example:

$mirc = "\x034this text is red\x033this text is green\x03 \x02bold text\x02
\x034,3this text is red with a green background\x03";

$tokenizer = new Tokenizer($mirc);

while(list($token, $data) = $tokenizer->getNext())
{
    switch($token)
    {
        case 'color-fgbg':
            printf('<%s:%d,%d>', $token, $data[1], $data[2]);
            break;

        case 'color-fg':
            printf('<%s:%d>', $token, $data[1]);
            break;

        case 'color-reset':
        case 'style-bold';
            printf('<%s>', $token);
            break;

        case 'catch-all':
            echo $data[0];
            break;

        default:
            throw new Exception(sprintf('Unknown token <%s>.', $token));
    }
}

This does not much yet, but identify the interesting parts and their (sub-) values as the output demonstrates:

<color-fg:4>this text is red<color-fg:3>this text is green<color-reset> <style-bold>bold text<style-bold>
<color-fgbg:4,3>this text is red with a green background<color-reset>

It should be relatively easy for you to modify the loop above and handle the states like opening/closing color and font-variant tags like bold.

The tokenizer itself defines a set of tokens of which is tries to find them one after the other at a certain offset (starting at the beginning of the string). The tokens are defined by regular expressions:

/**
 * regular expression based tokenizer,
 * first token wins.
 */
class Tokenizer
{
    private $subject;
    private $offset = 0;
    private $tokens = array(
        'color-fgbg'  => '\x03(\d{1,2}),(\d{1,2})',
        'color-fg'    => '\x03(\d{1,2})',
        'color-reset' => '\x03',
        'style-bold'  => '\x02',
        'catch-all' => '.|\n',
    );
    public function __construct($subject)
    {
        $this->subject = (string) $subject;
    }
    ...

As this private array shows, simple regular expressions and they get a name with their key. That's the name used in the switch statement above.

The next() function will look for a token at the current offset, and if found, will advance the offset and return the token incl. all subgroup matches. As offsets are involved, the more detailed $matches array is simplified (offsets removed) as the main routine normally does not need to know about offsets.

The principle is easy here: The first pattern wins. So you need to place the pattern that matches most (in sense of string length) on top to have this working. In your case, the largest one is the token for the foreground and background color, <color-fgbg>.

In case not token can be found, NULL is returned, so here the next() function:

...
/**
 * @return array|null
 */
public function getNext()
{
    if ($this->offset >= strlen($this->subject))
        return NULL;

    foreach($this->tokens as $name => $token)
    {
        if (FALSE === $r = preg_match("~$token~", $this->subject, $matches, PREG_OFFSET_CAPTURE, $this->offset))
            throw new RuntimeException('Pattern for token %s failed (regex error).', $name);
        if ($r === 0)
            continue;
        if (!isset($matches[0])) {
            var_dump(substr($this->subject, $this->offset));
            $c = 1;
        }
        if ($matches[0][1] !== $this->offset)
            continue;
        $data = array();
        foreach($matches as $match)
        {
            list($data[]) = $match;
        }

        $this->offset += strlen($data[0]);
        return array($name, $data);
    }
    return NULL;
}
...

So the tokenization of the string is now encapsulated into the Tokenizer class and the parsing of the token is something you can do your own inside some other part of your application. That should make it more easy for you to change the way of styling (HTML output, CSS based HTML output or something differnt like bbcode or markdown) but also the support of new codes in the future. Also in case something is missing you can more easily fix things because it's either a non-recognized code or something missing with the transformation.

The full example as gist: Tokenizer Example of Mirc Color and Style (bold) Codes.

Related resources:

这篇关于Mirc控制代码到html,通过php的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆