从特定格式的CSV文件中提取数据 [英] Extract data from Specific format CSV files

查看:595
本文介绍了从特定格式的CSV文件中提取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定CSV文件中一行的这种数据格式:

Given such data format of one row in CSV file:

'data(g1),data(g1)','data(g2),data(g2),data(g2),,,',,,'data(g5),,,data(g5)',

这是CSV格式,但对于单独的数据组,它使用

This is in a CSV format, but for a separate group of data, it use

''

将它们分组,例如:

.....'data(g2),data(g2),data(g2),,,'....

但是有一些尴尬的情况:一行可能错过一些组,而对于一个组,它可能会错过一些字段。但对于所有缺少的部分,它仍然使用逗号分隔它们。所以每行总是有6组数据。

But there are such awkward situations: a row could miss some groups, and for a group, it could miss some fields. But for all the missing part, it still use a comma to separate them. So each row will always have 6 groups of data. Then how can I got all 6 groups of data properly(even it contains nothing)?

我尝试使用这样的正则表达式:

I try to use regular expression like this:

String row = <the above data row>;
String[] dataGroups = row.split(',');

但是这一个肯定不会工作,因为在每组数据中,它也使用逗号分隔数据。任何有效的方式在Java中这样做?如果我可以将所有6组数据存储在

But this one definitely will not work since in each group of data, it also use comma to separate data. Any effective ways in Java to doing this? It will be awesome if I can have all 6 groups of data stored in a

String[] dataGroups

长度为6.
然后剩下的部分很容易。

which have a length of 6. Then the rest part will be easy.

推荐答案

Hm。这样的正则表达式呢?

Hm. What about a regex like this?

('.*')?,('.*')?,('.*')?,('.*')?,('.*')?,('.*')?

这很丑陋,但它可以正常工作...

It's ugly, but it may work correctly...

http://www.regular-expressions.info/java.html,你可以这样做:

Matcher m = Pattern.compile("('.*')?,('.*')?,('.*')?,('.*')?,('.*')?,('.*')?").matcher(row);
m.find();
m.group(1); //gives you the first group on the line; 
//change the index to get the other groups

.split(',')

这是一个实时版本: http://regex101.com/r/jR0iM4/1

这篇关于从特定格式的CSV文件中提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆