正则表达式麻烦,转义的引号 [英] Regular expression troubles, escaped quotes
问题描述
基本上,我被传递一个字符串,我需要tokenise它与命令行选项由一个* nix shell
Basically, I'm being passed a string and I need to tokenise it in much the same manner as command line options are tokenised by a *nix shell
我有以下字符串
"Hello\" World" "Hello Universe" Hi
如何将它变成3个元素列表
How could I turn it into a 3 element list
- Hello>世界
- Hello Universe
- Hi
以下是我的第一次尝试,但它有一些问题
The following is my first attempt, but it's got a number of problems
- 留下引号字符
- 不捕获转义报价
代码:
public void test() {
String str = "\"Hello\\\" World\" \"Hello Universe\" Hi";
List<String> list = split(str);
}
public static List<String> split(String str) {
Pattern pattern = Pattern.compile(
"\"[^\"]*\"" + /* double quoted token*/
"|'[^']*'" + /*single quoted token*/
"|[A-Za-z']+" /*everything else*/
);
List<String> opts = new ArrayList<String>();
Scanner scanner = new Scanner(str).useDelimiter(pattern);
String token;
while ((token = scanner.findInLine(pattern)) != null) {
opts.add(token);
}
return opts;
}
因此,以下代码的输出不正确
So the incorrect output of the following code is
- Hello\
- 世界
- 您好
- Universe
- Hi
- "Hello\"
- World
- " "
- Hello
- Universe
- Hi
EDIT 我完全可以使用非regex解决方案。它只是第一个解决方案。
EDIT I'm totally open to a non regex solution. It's just the first solution that came to mind
推荐答案
如果你决定要放弃regex,几个选项。如果你愿意只有一个双引号或单引号(但不是两者)作为你的报价,那么你可以使用StreamTokenizer来轻松解决这个问题:
If you decide you want to forego regex, and do parsing instead, there are a couple of options. If you are willing to have just a double quote or a single quote (but not both) as your quote, then you can use StreamTokenizer to solve this easily:
public static List<String> tokenize(String s) throws IOException {
List<String> opts = new ArrayList<String>();
StreamTokenizer st = new StreamTokenizer(new StringReader(s));
st.quoteChar('\"');
while (st.nextToken() != StreamTokenizer.TT_EOF) {
opts.add(st.sval);
}
return opts;
}
两个引号,这里是一个天真的实现,应该工作(注意,像blah \blahblah这样的字符串会产生类似blahblahblah这样的字符串,如果这不行,更改):
If you must support both quotes, here is a naive implementation that should work (caveat that a string like '"blah \" blah"blah' will yield something like 'blah " blahblah'. If that isn't OK, you will need to make some changes):
public static List<String> splitSSV(String in) throws IOException {
ArrayList<String> out = new ArrayList<String>();
StringReader r = new StringReader(in);
StringBuilder b = new StringBuilder();
int inQuote = -1;
boolean escape = false;
int c;
// read each character
while ((c = r.read()) != -1) {
if (escape) { // if the previous char is escape, add the current char
b.append((char)c);
escape = false;
continue;
}
switch (c) {
case '\\': // deal with escape char
escape = true;
break;
case '\"':
case '\'': // deal with quote chars
if (c == '\"' || c == '\'') {
if (inQuote == -1) { // not in a quote
inQuote = c; // now we are
} else {
inQuote = -1; // we were in a quote and now we aren't
}
}
break;
case ' ':
if (inQuote == -1) { // if we aren't in a quote, then add token to list
out.add(b.toString());
b.setLength(0);
} else {
b.append((char)c); // else append space to current token
}
break;
default:
b.append((char)c); // append all other chars to current token
}
}
if (b.length() > 0) {
out.add(b.toString()); // add final token to list
}
return out;
}
这篇关于正则表达式麻烦,转义的引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!