Java正则表达式提取带引号或不带引号的字段 [英] Java regex to extract fields with or without quotes

查看:223
本文介绍了Java正则表达式提取带引号或不带引号的字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从长字符串中提取两种基本形式的键值对,一种带有引号,另一种不带引号,例如

I am trying to extract key-value pairs from a long string in two basic forms, one with and one without quotes, like

... a="First Field" b=SecondField ...

使用Java正则表达式

\b(a|b)\s*(?:=)\s*("[^"]*"|[^ ]*)\b

但是,运行以下测试代码

However, running the following test code

public static void main(String[] args) {
  String input = "a=\"First Field\" b=SecondField";
  String regex = "\\b(a|b)\\s*(?:=)\\s*(\"[^\"]*\"|[^ ]*)\\b";
  Matcher matcher = Pattern.compile(regex).matcher(input);
  while (matcher.find()) {
    System.out.println(matcher.group(1) + " = " + matcher.group(2));
  }
}

输出为

a = "First
b = SecondField

而不是期望的(不带引号)

instead of the desired (without quotes)

a = First Field
b = SecondField

在更通用的输入中,例如

In a more generalized input, like

a ="First Field" b=SecondField c3= "Third field value" delta = "" e_value  = five!

输出应为(同样,不带引号,并且在=符号前后带有不同数量的空白)

the output should be (again, without quotes and with varying amounts of white space before and after the = sign)

a = First Field
b = SecondField
c3 = Third field value
delta = 
e_value = five!

是否存在用于覆盖上述用例的正则表达式(至少具有2个键的版本),还是应该使用字符串处理?

Is there a regular expression to cover the above use case (at least the version with the 2 keys), or should one resort to string processing?

更棘手的问题:如果有这样的正则表达式,是否还有任何方法可以使匹配器组的索引对应于该值恒定,以使带引号的字段值和不带引号的字段值都对应于同一组指数?

Even trickier question: if there is such a regex, is there also any way of keeping the index of the matcher group corresponding to the value constant, so that both the quoted field value and the unquoted field value correspond to the same group index?

推荐答案

从索引1和2获取匹配的组

Get the matched group from index 1 and 2

(\w+)=(?:")?(.*?(?="?\s+\w+=|(?:"?)$))

这是演示

示例代码:

String str = "a=\"First Field\" b=SecondField c=\"ThirdField\" d=\"FourthField\"";
Pattern p = Pattern.compile("(\\w+)=(?:\")?(.*?(?=\"?\\s+\\w+=|(?:\"?)$))");
Matcher m = p.matcher(str);
while (m.find()) {
    System.out.println("key : " + m.group(1) + "\tValue : " + m.group(2));
}

输出:

key : a Value : First Field
key : b Value : SecondField
key : c Value : ThirdField
key : d Value : FourthField


如果只需要ab键,则只需对正则表达式模式进行一些更改即可.


If you are looking for just a and b keys then just make slight change in the regex pattern.

先用a|b

(a|b)=(?:")?(.*?(?="?\s+\w+=|(?:"?)$))

这里是演示

根据帖子的修改

只需添加\s来检查空格.

(\w+)\s*=\s*(?:")?(.*?(?="?\s+\w+\s*=|(?:"?)$))

演示

这篇关于Java正则表达式提取带引号或不带引号的字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆