Java正则表达式提取带引号或不带引号的字段 [英] Java regex to extract fields with or without quotes
问题描述
我正在尝试从长字符串中提取两种基本形式的键值对,一种带有引号,另一种不带引号,例如
I am trying to extract key-value pairs from a long string in two basic forms, one with and one without quotes, like
... a="First Field" b=SecondField ...
使用Java
正则表达式
\b(a|b)\s*(?:=)\s*("[^"]*"|[^ ]*)\b
但是,运行以下测试代码
However, running the following test code
public static void main(String[] args) {
String input = "a=\"First Field\" b=SecondField";
String regex = "\\b(a|b)\\s*(?:=)\\s*(\"[^\"]*\"|[^ ]*)\\b";
Matcher matcher = Pattern.compile(regex).matcher(input);
while (matcher.find()) {
System.out.println(matcher.group(1) + " = " + matcher.group(2));
}
}
输出为
a = "First
b = SecondField
而不是期望的(不带引号)
instead of the desired (without quotes)
a = First Field
b = SecondField
在更通用的输入中,例如
In a more generalized input, like
a ="First Field" b=SecondField c3= "Third field value" delta = "" e_value = five!
输出应为(同样,不带引号,并且在=
符号前后带有不同数量的空白)
the output should be (again, without quotes and with varying amounts of white space before and after the =
sign)
a = First Field
b = SecondField
c3 = Third field value
delta =
e_value = five!
是否存在用于覆盖上述用例的正则表达式(至少具有2个键的版本),还是应该使用字符串处理?
Is there a regular expression to cover the above use case (at least the version with the 2 keys), or should one resort to string processing?
更棘手的问题:如果有这样的正则表达式,是否还有任何方法可以使匹配器组的索引对应于该值恒定,以使带引号的字段值和不带引号的字段值都对应于同一组指数?
Even trickier question: if there is such a regex, is there also any way of keeping the index of the matcher group corresponding to the value constant, so that both the quoted field value and the unquoted field value correspond to the same group index?
推荐答案
从索引1和2获取匹配的组
Get the matched group from index 1 and 2
(\w+)=(?:")?(.*?(?="?\s+\w+=|(?:"?)$))
这是演示
示例代码:
String str = "a=\"First Field\" b=SecondField c=\"ThirdField\" d=\"FourthField\"";
Pattern p = Pattern.compile("(\\w+)=(?:\")?(.*?(?=\"?\\s+\\w+=|(?:\"?)$))");
Matcher m = p.matcher(str);
while (m.find()) {
System.out.println("key : " + m.group(1) + "\tValue : " + m.group(2));
}
输出:
key : a Value : First Field
key : b Value : SecondField
key : c Value : ThirdField
key : d Value : FourthField
如果只需要a
和b
键,则只需对正则表达式模式进行一些更改即可.
If you are looking for just a
and b
keys then just make slight change in the regex pattern.
先用a|b
(a|b)=(?:")?(.*?(?="?\s+\w+=|(?:"?)$))
这里是演示
根据帖子的修改
只需添加\s
来检查空格.
(\w+)\s*=\s*(?:")?(.*?(?="?\s+\w+\s*=|(?:"?)$))
这篇关于Java正则表达式提取带引号或不带引号的字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!