在Java中使用RegEx解析CSV输入 [英] Parsing CSV input with a RegEx in java

查看:100
本文介绍了在Java中使用RegEx解析CSV输入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道,现在我有两个问题。但我很开心。

I know, now I have two problems. But I'm having fun!

我开始用这个建议不要尝试和拆分,而是匹配什么是可接受的字段,并从那里扩展到此表达式。

I started with this advice not to try and split, but instead to match on what is an acceptable field, and expanded from there to this expression.

final Pattern pattern = Pattern.compile("\"([^\"]*)\"|(?<=,|^)([^,]*)(?=,|$)");

$ b b

表达式看起来像这样没有恼人的转义引号:

The expression looks like this without the annoying escaped quotes:

"([^"]*)"|(?<=,|^)([^,]*)(?=,|$)

这对我来说工作得很好 - 或者它匹配两个引号和它们之间的任何东西,或者在行的开始或逗号和行尾或逗号之间的某个东西。迭代通过匹配得到我所有的字段,即使他们是空的。例如,

This is working well for me - either it matches on "two quotes and whatever is between them", or "something between the start of the line or a comma and the end of the line or a comma". Iterating through the matches gets me all the fields, even if they are empty. For instance,

the quick, "brown, fox jumps", over, "the",,"lazy dog"

分成

the quick
"brown, fox jumps"
over
"the"

"lazy dog"

太棒了!现在我想删除引号,所以我添加了lookahead和lookbehind非捕获组,就像我在逗号。

Great! Now I want to drop the quotes, so I added the lookahead and lookbehind non-capturing groups like I was doing for the commas.

final Pattern pattern = Pattern.compile("(?<=\")([^\"]*)(?=\")|(?<=,|^)([^,]*)(?=,|$)");



again the expression is:

(?<=")([^"]*)(?=")|(?<=,|^)([^,]*)(?=,|$)

>

Instead of the desired result

the quick
brown, fox jumps
over
the

lazy dog

现在我得到这个细分:

the quick
"brown
 fox jumps"
,over,
"the"
,,
"lazy dog"

我缺少什么?

推荐答案

运算符优先级。基本上没有。这一切都是从左到右。因此,或(|)应用于结束报价前瞻和逗号前瞻

Operator precedence. Basically there is none. It's all left to right. So the or (|) is applying to the closing quote lookahead and the comma lookahead

尝试:

(?:(?<=")([^"]*)(?="))|(?<=,|^)([^,]*)(?=,|$)

这篇关于在Java中使用RegEx解析CSV输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆