正则表达式删除单行SQL注释( - ) [英] Regex to remove single-line SQL comments (--)
问题描述
问:
任何人都可以给我一个工作正则表达式(C#/ VB.NET),可以消除一行从SQL语句评论
?我的意思是这些注释:
- 这是一个注释
而不是那些
/ *这是一个注释* /
因为我已经可以处理恒星意见。
我有一个作了少许解析器当他们在该行的开始,消除这些意见,但它们也可以是代码或更糟后的某处,在SQL串'你好 - 测试 - 世界
那些意见也应该被删除(除非那些理所当然的SQL字符串 - 如果可能的话)。
令人惊讶的我没有得到正则表达式的工作。我会认为明星的意见是比较困难的,但实际上,他们不是
根据要求,在这里我的代码删除/ ** / - 风格评论
(为了让它忽略SQL风格的字符串,你有一个唯一标识符,以替补多串(我用4 concated),然后应用注释去除,再申请字符串回代。
静态字符串RemoveCstyleComments(字符串strInput)
{
串strPattern = @/ [*] [\w\ d\s] + [*] /;
// strPattern = @/ \ * * * \ /。?; //不行的
// strPattern = /\\*.*?\\*/; //不行的
// strPattern = @/ \ *([^ *] | [\r\\\
] |(\ * +([^ * /] | [\r\\\
])))* \ * + /; //不行的
// strPattern = @/ \ *([^ *] | [\r\\\
] |(\ * +([^ * /] | [\r\\\
])))* \ * + /; / /不工作
// http://stackoverflow.com/questions/462843/improving-fixing-a-regex-for-c-style-block-comments
strPattern = @/ \ *(?>(?:?!(大于[^ *] +)| \ *(/))*)\ * /; //作品!
串strOutput = System.Text.RegularExpressions.Regex.Replace(strInput,strPattern,的String.Empty,System.Text.RegularExpressions.RegexOptions.Multiline);
Console.WriteLine(strOutput);
返回strOutput;
} //端功能RemoveCstyleComments
我会辜负大家。这不能用正则表达式来完成。当然,这是很容易找到不是在一个字符串评论(连OP可以做),真正的交易是在一个字符串意见。还有就是看变通的一点希望,但是这仍然不够。告诉你有一个报价之前在一条线上将不能保证什么。什么保证你的时候,唯一的事情是报价的怪胎。有些事情,你不能用正则表达式找到。所以只是简单地去与非正则表达式的方法
编辑:
这里的C#代码:
字符串SQL = - 这是一个test\r\\\
select的东西在那里substaff像 - 这评论应该留 - 这应removed\r\\\
的char [] =报价{'\'',''};
INT newCommentLiteral,lastCommentLiteral = 0;
,而((newCommentLiteral = sql.IndexOf( - , lastCommentLiteral))= -1)
{
INT countQuotes = sql.Substring(lastCommentLiteral,newCommentLiteral - lastCommentLiteral).Split(引号)。长度 - 1;!
如果(countQuotes%2 = = 0)//这是一条评论,因为有偶数个前
{
INT EOL = sql.IndexOf(\r\\\
)+ 2报价;
如果(EOL == -1)
EOL = sql.Length; //不换行,这意味着字符串
SQL = sql.Remove(newCommentLiteral,EOL - newCommentLiteral)结束;
lastCommentLiteral = newCommentLiteral;
}
,否则//这是一个字符串中,找到字符串结束,并移动到它
{
INT singleQuote = sql.IndexOf(',newCommentLiteral );
如果(singleQuote == -1)
singleQuote = sql.Length;
INT双引号= sql.IndexOf('',newCommentLiteral);
如果(双引号== -1)
双引号= sql.Length;
lastCommentLiteral = Math.Min(singleQuote,双引号)+ 1;
//而不是找到字符串,你可以简单地做+ = 2,但该方案将成为稍慢
} $ b的结束$ b}
Console.WriteLine(SQL);
这能做什么:找到每一个注释文字对于每一个,检查它是否是一个注释中与否,通过计算当前的匹配,最后一个的报价数量如果这个数字是偶数,那么它的评论,从而将其删除(查找第一个。 。行尾和删除之间什么),如果是奇数,所以这是一个字符串中,找到字符串的结尾,并移动到它RGIS片段是基于一个奇怪的SQL招:这是一个有效的字符串。即使寿2报价有所不同。如果这不是你的SQL语言真的,你应该尝试一个完全不同的方法。我会写一个程序的太多,如果是这样的话,但是这一次的更快,更简单。
Question:
Can anybody give me a working regex expression (C#/VB.NET) that can remove single line comments from a SQL statement ?
I mean these comments:
-- This is a comment
not those
/* this is a comment */
because I already can handle the star comments.
I have a made a little parser that removes those comments when they are at the start of the line, but they can also be somewhere after code or worse, in a SQL-string 'hello --Test -- World'
Those comments should also be removed (except those in a SQL string of course - if possible).
Surprisingly I didn't got the regex working. I would have assumed the star comments to be more difficult, but actually, they aren't.
As per request, here my code to remove /**/-style comments (In order to have it ignore SQL-Style strings, you have to subsitute strings with a uniqueidentifier (i used 4 concated), then apply the comment-removal, then apply string-backsubstitution.
static string RemoveCstyleComments(string strInput)
{
string strPattern = @"/[*][\w\d\s]+[*]/";
//strPattern = @"/\*.*?\*/"; // Doesn't work
//strPattern = "/\\*.*?\\*/"; // Doesn't work
//strPattern = @"/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/ "; // Doesn't work
//strPattern = @"/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/ "; // Doesn't work
// http://stackoverflow.com/questions/462843/improving-fixing-a-regex-for-c-style-block-comments
strPattern = @"/\*(?>(?:(?>[^*]+)|\*(?!/))*)\*/"; // Works !
string strOutput = System.Text.RegularExpressions.Regex.Replace(strInput, strPattern, string.Empty, System.Text.RegularExpressions.RegexOptions.Multiline);
Console.WriteLine(strOutput);
return strOutput;
} // End Function RemoveCstyleComments
I will disappoint all of you. This can't be done with regular expressions. Sure, it's easy to find comments not in a string (that even the OP could do), the real deal is comments in a string. There is a little hope of the look arounds, but that's still not enough. By telling that you have a preceding quote in a line won't guarantee anything. The only thing what guarantees you something is the oddity of quotes. Something you can't find with regular expression. So just simply go with non-regular-expression approach.
EDIT: Here's the c# code:
String sql = "--this is a test\r\nselect stuff where substaff like '--this comment should stay' --this should be removed\r\n";
char[] quotes = { '\'', '"'};
int newCommentLiteral, lastCommentLiteral = 0;
while ((newCommentLiteral = sql.IndexOf("--", lastCommentLiteral)) != -1)
{
int countQuotes = sql.Substring(lastCommentLiteral, newCommentLiteral - lastCommentLiteral).Split(quotes).Length - 1;
if (countQuotes % 2 == 0) //this is a comment, since there's an even number of quotes preceding
{
int eol = sql.IndexOf("\r\n") + 2;
if (eol == -1)
eol = sql.Length; //no more newline, meaning end of the string
sql = sql.Remove(newCommentLiteral, eol - newCommentLiteral);
lastCommentLiteral = newCommentLiteral;
}
else //this is within a string, find string ending and moving to it
{
int singleQuote = sql.IndexOf("'", newCommentLiteral);
if (singleQuote == -1)
singleQuote = sql.Length;
int doubleQuote = sql.IndexOf('"', newCommentLiteral);
if (doubleQuote == -1)
doubleQuote = sql.Length;
lastCommentLiteral = Math.Min(singleQuote, doubleQuote) + 1;
//instead of finding the end of the string you could simply do += 2 but the program will become slightly slower
}
}
Console.WriteLine(sql);
What this does: find every comment literal. For each, check if it's within a comment or not, by counting the number of quotes between the current match and the last one. If this number is even, then it's a comment, thus remove it (find first end of line and remove whats between). If it's odd, this is within a string, find the end of the string and move to it. Rgis snippet is based on a wierd SQL trick: 'this" is a valid string. Even tho the 2 quotes differ. If it's not true for your SQL language, you should try a completely different approach. I'll write a program to that too if that's the case, but this one's faster and more straightforward.
这篇关于正则表达式删除单行SQL注释( - )的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!