正则表达式删除单行SQL注释( - ) [英] Regex to remove single-line SQL comments (--)

查看:573
本文介绍了正则表达式删除单行SQL注释( - )的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问:



任何人都可以给我一个工作正则表达式(C#/ VB.NET),可以消除一行从SQL语句评论



我的意思是这些注释:

   - 这是一个注释

而不是那些

  / *这是一个注释* / 

因为我已经可以处理恒星意见。



我有一个作了少许解析器当他们在该行的开始,消除这些意见,但它们也可以是代码或更糟后的某处,在SQL串'你好 - 测试 - 世界
那些意见也应该被删除(除非那些理所当然的SQL字符串 - 如果可能的话)。



令人惊讶的我没有得到正则表达式的工作。我会认为明星的意见是比较困难的,但实际上,他们不是



根据要求,在这里我的代码删除/ ** / - 风格评论
(为了让它忽略SQL风格的字符串,你有一个唯一标识符,以替补多串(我用4 concated),然后应用注释去除,再申请字符串回代。

 静态字符串RemoveCstyleComments(字符串strInput)
{
串strPattern = @/ [*] [\w\ d\s] + [*] /;
// strPattern = @/ \ * * * \ /。?; //不行的
// strPattern = /\\*.*?\\*/; //不行的
// strPattern = @/ \ *([^ *] | [\r\\\
] |(\ * +([^ * /] | [\r\\\
])))* \ * + /; //不行的
// strPattern = @/ \ *([^ *] | [\r\\\
] |(\ * +([^ * /] | [\r\\\
])))* \ * + /; / /不工作

// http://stackoverflow.com/questions/462843/improving-fixing-a-regex-for-c-style-block-comments
strPattern = @/ \ *(?>(?:?!(大于[^ *] +)| \ *(/))*)\ * /; //作品!

串strOutput = System.Text.RegularExpressions.Regex.Replace(strInput,strPattern,的String.Empty,System.Text.RegularExpressions.RegexOptions.Multiline);
Console.WriteLine(strOutput);
返回strOutput;
} //端功能RemoveCstyleComments


解决方案

我会辜负大家。这不能用正则表达式来完成。当然,这是很容易找到不是在一个字符串评论(连OP可以做),真正的交易是在一个字符串意见。还有就是看变通的一点希望,但是这仍然不够。告诉你有一个报价之前在一条线上将不能保证什么。什么保证你的时候,唯一的事情是报价的怪胎。有些事情,你不能用正则表达式找到。所以只是简单地去与非正则表达式的方法



编辑:
这里的C#代码:

 字符串SQL = - 这是一个test\r\\\
select的东西在那里substaff像 - 这评论应该留 - 这应removed\r\\\

的char [] =报价{'\'',''};
INT newCommentLiteral,lastCommentLiteral = 0;
,而((newCommentLiteral = sql.IndexOf( - , lastCommentLiteral))= -1)
{
INT countQuotes = sql.Substring(lastCommentLiteral,newCommentLiteral - lastCommentLiteral).Split(引号)。长度 - 1;!
如果(countQuotes%2 = = 0)//这是一条评论,因为有偶数个前
{
INT EOL = sql.IndexOf(\r\\\
)+ 2报价;
如果(EOL == -1)
EOL = sql.Length; //不换行,这意味着字符串
SQL = sql.Remove(newCommentLiteral,EOL - newCommentLiteral)结束;
lastCommentLiteral = newCommentLiteral;
}
,否则//这是一个字符串中,找到字符串结束,并移动到它
{
INT singleQuote = sql.IndexOf(',newCommentLiteral );
如果(singleQuote == -1)
singleQuote = sql.Length;
INT双引号= sql.IndexOf('',newCommentLiteral);
如果(双引号== -1)
双引号= sql.Length;

lastCommentLiteral = Math.Min(singleQuote,双引号)+ 1;

//而不是找到字符串,你可以简单地做+ = 2,但该方案将成为稍慢
} $ b的结束$ b}

Console.WriteLine(SQL);

这能做什么:找到每一个注释文字对于每一个,检查它是否是一个注释中与否,通过计算当前的匹配,最后一个的报价数量如果这个数字是偶数,那么它的评论,从而将其删除(查找第一个。 。行尾和删除之间什么),如果是奇数,所以这是一个字符串中,找到字符串的结尾,并移动到它RGIS片段是基于一个奇怪的SQL招:这是一个有效的字符串。即使寿2报价有所不同。如果这不是你的SQL语言真的,你应该尝试一个完全不同的方法。我会写一个程序的太多,如果是这样的话,但是这一次的更快,更简单。


Question:

Can anybody give me a working regex expression (C#/VB.NET) that can remove single line comments from a SQL statement ?

I mean these comments:

-- This is a comment

not those

/* this is a comment */

because I already can handle the star comments.

I have a made a little parser that removes those comments when they are at the start of the line, but they can also be somewhere after code or worse, in a SQL-string 'hello --Test -- World' Those comments should also be removed (except those in a SQL string of course - if possible).

Surprisingly I didn't got the regex working. I would have assumed the star comments to be more difficult, but actually, they aren't.

As per request, here my code to remove /**/-style comments (In order to have it ignore SQL-Style strings, you have to subsitute strings with a uniqueidentifier (i used 4 concated), then apply the comment-removal, then apply string-backsubstitution.

    static string RemoveCstyleComments(string strInput) 
    { 
        string strPattern = @"/[*][\w\d\s]+[*]/"; 
        //strPattern = @"/\*.*?\*/"; // Doesn't work 
        //strPattern = "/\\*.*?\\*/"; // Doesn't work 
        //strPattern = @"/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/ "; // Doesn't work 
        //strPattern = @"/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/ "; // Doesn't work 

        // http://stackoverflow.com/questions/462843/improving-fixing-a-regex-for-c-style-block-comments 
        strPattern = @"/\*(?>(?:(?>[^*]+)|\*(?!/))*)\*/";  // Works ! 

        string strOutput = System.Text.RegularExpressions.Regex.Replace(strInput, strPattern, string.Empty, System.Text.RegularExpressions.RegexOptions.Multiline); 
        Console.WriteLine(strOutput); 
        return strOutput; 
    } // End Function RemoveCstyleComments 

解决方案

I will disappoint all of you. This can't be done with regular expressions. Sure, it's easy to find comments not in a string (that even the OP could do), the real deal is comments in a string. There is a little hope of the look arounds, but that's still not enough. By telling that you have a preceding quote in a line won't guarantee anything. The only thing what guarantees you something is the oddity of quotes. Something you can't find with regular expression. So just simply go with non-regular-expression approach.

EDIT: Here's the c# code:

        String sql = "--this is a test\r\nselect stuff where substaff like '--this comment should stay' --this should be removed\r\n";
        char[] quotes = { '\'', '"'};
        int newCommentLiteral, lastCommentLiteral = 0;
        while ((newCommentLiteral = sql.IndexOf("--", lastCommentLiteral)) != -1)
        {
            int countQuotes = sql.Substring(lastCommentLiteral, newCommentLiteral - lastCommentLiteral).Split(quotes).Length - 1;
            if (countQuotes % 2 == 0) //this is a comment, since there's an even number of quotes preceding
            {
                int eol = sql.IndexOf("\r\n") + 2;
                if (eol == -1)
                    eol = sql.Length; //no more newline, meaning end of the string
                sql = sql.Remove(newCommentLiteral, eol - newCommentLiteral);
                lastCommentLiteral = newCommentLiteral;
            }
            else //this is within a string, find string ending and moving to it
            {
                int singleQuote = sql.IndexOf("'", newCommentLiteral);
                if (singleQuote == -1)
                    singleQuote = sql.Length;
                int doubleQuote = sql.IndexOf('"', newCommentLiteral);
                if (doubleQuote == -1)
                    doubleQuote = sql.Length;

                lastCommentLiteral = Math.Min(singleQuote, doubleQuote) + 1;

                //instead of finding the end of the string you could simply do += 2 but the program will become slightly slower
            }
        }

        Console.WriteLine(sql);

What this does: find every comment literal. For each, check if it's within a comment or not, by counting the number of quotes between the current match and the last one. If this number is even, then it's a comment, thus remove it (find first end of line and remove whats between). If it's odd, this is within a string, find the end of the string and move to it. Rgis snippet is based on a wierd SQL trick: 'this" is a valid string. Even tho the 2 quotes differ. If it's not true for your SQL language, you should try a completely different approach. I'll write a program to that too if that's the case, but this one's faster and more straightforward.

这篇关于正则表达式删除单行SQL注释( - )的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆