RegEx在字符串(分号;)上分割字符串,但字符串中出现的字符串除外 [英] RegEx split string with on a delimeter(semi-colon ;) except those that appear inside a string
问题描述
我有一个Java String,它实际上是一个SQL脚本。
I have a Java String which is actually an SQL script.
CREATE OR REPLACE PROCEDURE Proc
AS
b NUMBER:=3;
c VARCHAR2(2000);
begin
c := 'BEGIN ' || ' :1 := :1 + :2; ' || 'END;';
end Proc;
我想在分号上拆分脚本,除了字符串中出现的那些。
所需的输出是四个不同的字符串,如下所述
I want to split the script on semi-colon except those that appear inside a string. The desired output is four different strings as mentioned below
1- CREATE OR REPLACE PROCEDURE Proc AS b NUMBER:=3
2- c VARCHAR2(2000)
3- begin c := 'BEGIN ' || ' :1 := :1 + :2; ' || 'END;';
4- end Proc
Java Split()方法也会将字符串拆分为标记。我希望保留这个字符串,因为分号在引号内。
Java Split() method will split above string into tokens as well. I want to keep this string as it is as the semi-colons are inside quotes.
c := 'BEGIN ' || ' :1 := :1 + :2; ' || 'END;';
Java Split()方法输出
Java Split() method output
1- c := 'BEGIN ' || ' :1 := :1 + :2
2- ' || 'END
3- '
请建议一个可以将字符串拆分为半的RegEx冒号除了那些进入字符串内的冒号。
Please suggest a RegEx that could split the string on semi-colons except those that come inside string.
===================== CASE-2 ========================
以上部分已被删除回答及其工作
这是另一个更复杂的案例
<强> ============================================== ========
我有一个SQL脚本,我想标记每个SQL查询。每个SQL查询由分号(;)或正斜杠(/)分隔。
I have an SQL Script and I want to tokenize each SQL query. Each SQL query is separated by either semi-colon(;) or forward slash(/).
1-如果它们出现在字符串中,我想要转义半冒号或/符号
1- I want to escape semi colon or / sign if they appear inside a string like
...WHERE col1 = 'some ; name/' ..
2-表达式还必须转义任何多行注释语法,即/ *
2- Expression must also escape any multiline comment syntax which is /*
这是输入
/*Query 1*/
SELECT
*
FROM tab t
WHERE (t.col1 in (1, 3)
and t.col2 IN (1,5,8,9,10,11,20,21,
22,23,24,/*Reaffirmed*/
25,26,27,28,29,30,
35,/*carnival*/
75,76,77,78,79,
80,81,82, /*Damark accounts*/
84,85,87,88,90))
;
/*Query 2*/
select * from table
/
/*Query 3*/
select col form tab2
;
/*Query 4*/
select col2 from tab3 /*this is a multi line comment*/
/
期望结果
[1]: /*Query 1*/
SELECT
*
FROM tab t
WHERE (t.col1 in (1, 3)
and t.col2 IN (1,5,8,9,10,11,20,21,
22,23,24,/*Reaffirmed*/
25,26,27,28,29,30,
35,/*carnival*/
75,76,77,78,79,
80,81,82, /*Damark accounts*/
84,85,87,88,90))
[2]:/*Query 2*/
select * from table
[3]: /*Query 3*/
select col form tab2
[4]:/*Query 4*/
select col2 from tab3 /*this is a multi line comment*/
其中一半已经可以用什么来实现正如我在上一篇文章中所建议的那样(链接一个开头)但是当在查询中引入注释语法(/ *)并且每个查询也可以用正斜杠(/)分隔时,表达式不起作用。
Half of it can already be achieved by what was suggested to me in the previous post( link a start) but when comments syntax(/*) is introduced into the queries and each query can also be separated by forward slash(/), expression doesn't work.
推荐答案
正则表达式模式((?:(?:'[^'] *')| [^; ])*);
应该能满足您的需求。使用而
循环和 Matcher.find()
来提取所有SQL语句。类似于:
The regular expression pattern ((?:(?:'[^']*')|[^;])*);
should give you what you need. Use a while
loop and Matcher.find()
to extract all the SQL statements. Something like:
Pattern p = Pattern.compile("((?:(?:'[^']*')|[^;])*);";);
Matcher m = p.matcher(s);
int cnt = 0;
while (m.find()) {
System.out.println(++cnt + ": " + m.group(1));
}
使用您提供的示例SQL,将输出:
Using the sample SQL you provided, will output:
1: CREATE OR REPLACE PROCEDURE Proc
AS
b NUMBER:=3
2:
c VARCHAR2(2000)
3:
begin
c := 'BEGIN ' || ' :1 := :1 + :2; ' || 'END;'
4:
end Proc
如果你想获得终止;
,使用 m.group(0)
而不是 m.group(1 )
。
If you want to get the terminating ;
, use m.group(0)
instead of m.group(1)
.
有关正则表达式的更多信息,请参阅模式 JavaDoc和这个伟大的参考。以下是该模式的概要:
For more information on regular expressions, see the Pattern JavaDoc and this great reference. Here's a synopsis of the pattern:
( Start capturing group
(?: Start non-capturing group
(?: Start non-capturing group
' Match the literal character '
[^'] Match a single character that is not '
* Greedily match the previous atom zero or more times
' Match the literal character '
) End non-capturing group
| Match either the previous or the next atom
[^;] Match a single character that is not ;
) End non-capturing group
* Greedily match the previous atom zero or more times
) End capturing group
; Match the literal character ;
这篇关于RegEx在字符串(分号;)上分割字符串,但字符串中出现的字符串除外的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!