使用CsvBeanReader读取具有可变列数的CSV文件 [英] Using CsvBeanReader to read a CSV file with a variable number of columns

查看:1000
本文介绍了使用CsvBeanReader读取具有可变列数的CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我正在解析.csv文件。我采取了另一个线程的建议在StackOverflow和下载的SuperCSV。我终于得到了几乎一切正常,但现在我遇到了一个似乎很难修复的错误。



出现这个问题是因为最后两列数据可能可能不会填充。下面是一个.csv文件的示例,其中第一行缺少最后一列,第二行完全完成:


2012:07 :25,11:48:20,922,uLog.exe,,Key
按,1246,341,-1.00,-1.00,1.00,Shift
2012:07:25,11: 48:21,094,uLog.exe,,Key
按,1246,341,-1.00,-1.00,1.00,b,Shift


根据我对 Super CSV Javadoc 的了解,没有办法使用 CsvBeanReader (如果列数可变)。这看起来真的很蠢,因为我觉得这些丢失的列应该被允许为空或一些其他默认值,当Bean初始化时。



为了参考,这里是我完整代码:

  public class ULogParser {

String uLogFileLocation;
String screenRecorderFileLocation;

private static final CellProcessor [] cellProcessor = new CellProcessor [] {
new ParseDate(yyyy:MM:dd),
new ParseDate(HH:mm:ss ),
new StrMinMax(0,100),
new StrMinMax(0,100),
new ParseDate(SSS),
new StrMinMax ,
new ParseInt(),
new ParseDouble(),
new ParseDouble(),
new ParseDouble new StrMinMax(0,100),
new StrMinMax(0,100),
};

public String [] header = {Date,Time,Msec,Application,Window,Message,X,Y,RelDist ,TotalDist,Rate,Extra1,Extra2};

public ULogParser(String uLogFileLocation,String screenRecorderFileLocation)
{
this.uLogFileLocation = uLogFileLocation;
this.screenRecorderFileLocation = screenRecorderFileLocation;
}

public void parse()
{
try {
ICsvBeanReader reader = new CsvBeanReader(new BufferedReader(new FileReader(uLogFileLocation)),CsvPreference .STANDARD_PREFERENCE);
reader.getCSVHeader(false); // parse past the header
条目条目;
entry = reader.read(Entry.class,header,cellProcessor);
System.out.println(entry.Application);
} catch(FileNotFoundException e){
// TODO自动生成的catch块
e.printStackTrace();
} catch(IOException e){
// TODO自动生成的catch块
e.printStackTrace();
}
}

public void sendToDB()
{
查询查询= new Query();
}
}

Entry类的代码:

  public class Entry 
{
private Date Date;
private日期时间;
private日期Msec;
private String应用程序;
private String Window;
private String Message;
private int X;
private int Y;
private double RelDist;
private double TotalDist;
private double Rate;
private String Extra1;
private String Extra2;

public Date getDate(){return Date; }
public Date getTime(){return Time; }
public Date getMsec(){return Msec; }
public String getApplication(){return Application; }
public String getWindow(){return Window; }
public String getMessage(){return Message; }
public int getX(){return X; }
public int getY(){return Y; }
public double getRelDist(){return RelDist; }
public double getTotalDist(){return TotalDist; }
public double getRate(){return Rate; }
public String getExtra1(){return Extra1; }
public String getExtra2(){return Extra2; }

public void setDate(Date Date){this.Date = Date; }
public void setTime(Date Time){this.Time = Time; }
public void setMsec(Date Msec){this.Msec = Msec; }
public void setApplication(String Application){this.Application = Application; }
public void setWindow(String Window){this.Window = Window; }
public void setMessage(String Message){this.Message = Message; }
public void setX(int X){this.X = X; }
public void setY(int Y){this.Y = Y; }
public void setRelDist(double RelDist){this.RelDist = RelDist; }
public void setTotalDist(double TotalDist){this.TotalDist = TotalDist; }
public void setRate(double Rate){this.Rate = Rate; }
public void setExtra1(String Extra1){this.Extra1 = Extra1; }
public void setExtra2(String Extra2){this.Extra2 = Extra2; }

public Entry(){}
}

(注意,这是一个不同于我上面的例子,缺少两个最后两列):

线程中的异常main 值数组(大小12)必须与处理器数组(大小13)匹配:您可能正在读取一个CSV行,其列数不同于指定上下文的单元格处理器数:行:2列:0原始行:
[2012:07:25,11:48:05,740,uLog.exe,,记录开始,-1,-1,-1.00,-1.00,-1.00,]
offending processor:null
at org.supercsv.util.Util.processStringList(未知源)
at org.supercsv.io.CsvBeanReader.read(未知源)
at processing.ULogParser.parse(ULogParser.java: 59)
at ui.ParseImplicitData.main(ParseImplicitData.java:15)

是的,写所有的getters和setters是一个痛苦屁股。另外,我道歉,我可能没有完美的约定在我使用SuperCSV(像什么CellProcessor使用,如果你只是想改变字符串),但你得到的想法。另外,这段代码显然不完整。现在,我只是想成功检索一行数据。



此时,我想知道如果使用CsvBeanReader是可能的我的目的。如果没有,我有点失望,因为CsvListReader(我会发布超链接,但StackOverflow不允许我,也愚蠢)就像不使用API​​一样简单,只是使用Scanner.next ()。



任何帮助将不胜感激。提前感谢!

解决方案

修改:更新 Super CSV 2.0.0-beta-1



请注意,API已更改Super CSV 2.0.0-beta-1(代码示例基于1.52)。所有读者的 getCSVHeader()方法现在是 getHeader()(符合 code> SuperCSVException 已被重命名为 c>

code> SuperCsvException 。






对于超级CSV 2.1.0



从版本2.1.0起,可以在读取一行CSV之后,使用新的 executeProcessorors()方法。有关详情,请参阅项目网站上的此示例。请注意,这只与 CsvListReader 相关,因为它是允许可变列长度的唯一读者。






您是正确的 - CsvBeanReader 不支持具有可变列数的CSV文件。根据大多数CSV规范(包括 RFC 4180 ),每栏的列数必须相同行。



由于这个原因(作为超级CSV开发者),我不愿意将此功能添加到超级CSV。如果你能想到一个优雅的方式添加它,然后随时对项目的SourceForge网站提出建议。这可能意味着一个新的读者扩展到 CsvBeanReader :它必须将读取和映射/处理分为两个单独的方法(你不能做任何处理或映射



简单解决方案



简单解决方案(如果你可以控制你正在使用的CSV文件)只是在写CSV文件时添加一个空列(你的示例中的第一行在结尾有一个逗号 - 表示最后一列是空)。这样,您的CSV文件将是有效的(每行都有相同的列数),您可以使用 CsvBeanReader



如果这不可能,则所有都不会丢失



花式解决方案



正如您可能意识到的, CsvBeanReader 使用名称映射将CSV文件中的每个列与bean中的字段相关联,使用CellProcessor数组来处理每个列。



CsvListReader

如果你想使用它,你必须知道有多少列code>,另一方面,是非常原始的,可以读取不同长度的行(因为它不需要处理或映射它们)。



因此,您可以将 CsvBeanReader 的所有功能与 CsvListReader (如以下示例中所做的)两个读者并行:使用 CsvListReader 来确定有多少列,并且 CsvBeanReader 来执行处理/映射。



请注意,这里假设只有birthDate列可能不存在(即如果你不能知道哪一列是缺少)。

 包示例; 

import java.io.StringReader;
import java.util.Date;

import org.supercsv.cellprocessor.ParseDate;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.exception.SuperCSVException;
import org.supercsv.io.CsvBeanReader;
import org.supercsv.io.CsvListReader;
import org.supercsv.io.ICsvBeanReader;
import org.supercsv.io.ICsvListReader;
import org.supercsv.prefs.CsvPreference;

public class VariableColumns {

private static final String INPUT =name,birthDate,city\\\

+John,New York \\\

+Sally,22/03/1974,London\\\

+Jim,Sydney;

//单元处理器
private static final CellProcessor [] NORMAL_PROCESSORS =
new CellProcessor [] {null,new ParseDate(dd / MM / yyyy),null};
private static final CellProcessor [] NO_BIRTHDATE_PROCESSORS =
new CellProcessor [] {null,null};

// name mappings
private static final String [] NORMAL_HEADER =
new String [] {name,birthDate,city};
private static final String [] NO_BIRTHDATE_HEADER =
new String [] {name,city};

public static void main(String [] args){

//使用bean读取器和列表读取器(读取同一个文件)
final ICsvBeanReader beanReader = new CsvBeanReader(new StringReader(
INPUT),CsvPreference.STANDARD_PREFERENCE);
final ICsvListReader listReader = new CsvListReader(new StringReader(
INPUT),CsvPreference.STANDARD_PREFERENCE);

try {
//跳过标题
beanReader.getCSVHeader(true);
listReader.getCSVHeader(true);

while(listReader.read()!= null){

final String [] nameMapping;
final CellProcessor [] processors;

if(listReader.length()== NORMAL_HEADER.length){
//所有列 - 使用正常的标题/处理器
nameMapping = NORMAL_HEADER;
processors = NORMAL_PROCESSORS;

} else if(listReader.length()== NO_BIRTHDATE_HEADER.length){
//少一列 - 出生日期必须缺少
nameMapping = NO_BIRTHDATE_HEADER;
processors = NO_BIRTHDATE_PROCESSORS;

} else {
throw new SuperCSVException(
意外的列数:
+ listReader.length());
}

//现在可以安全地使用CsvBeanReader
//(我们知道有多少列)
Person person = beanReader.read(Person.class, nameMapping,
processor);

System.out.println(String.format(
Person:name =%s,birthDate =%s,city =%s,
person.getName ,person.getBirthDate(),
person.getCity()));

}
} catch(Exception e){
//这里处理异常
e.printStackTrace();
} finally {
//在这里关闭读者
}
}

public static class Person {

private String name ;
private Date birthDate;
private String city;

public String getName(){
return name;
}

public void setName(String name){
this.name = name;
}

public Date getBirthDate(){
return birthDate;
}

public void setBirthDate(Date birthDate){
this.birthDate = birthDate;
}

public String getCity(){
return city;
}

public void setCity(String city){
this.city = city;
}
}

}



这有助于。



哦,有什么理由为你 Entry 类中的字段不遵循正常的命名约定骆驼香烟盒)?如果你更新你的数组使用camelcase,那么你的字段也可以是camelcase。


So I'm working on parsing a .csv file. I took the advice of another thread somewhere on StackOverflow and downloaded SuperCSV. I finally got pretty much everything working, but now I've run into a bug that seems difficult to fix.

The problem occurs because the last two columns of data may or may not be populated. Here is an example of a .csv file with the first row missing the last column, and the second row entirely complete:

2012:07:25,11:48:20,922,"uLog.exe","",Key pressed,1246,341,-1.00,-1.00,1.00,Shift 2012:07:25,11:48:21,094,"uLog.exe","",Key pressed,1246,341,-1.00,-1.00,1.00,b,Shift

From my understanding of the Super CSV Javadoc, there is no way to populate a Java Bean with the CsvBeanReader if there are a variable number of columns. This seems really dumb because I feel like these missing columns should be allowed to be null or some other default value when the Bean is initialized.

For reference, here is my complete code for the parser:

public class ULogParser {

String uLogFileLocation;
String screenRecorderFileLocation;

private static final CellProcessor[] cellProcessor = new CellProcessor[] {
    new ParseDate("yyyy:MM:dd"),
    new ParseDate("HH:mm:ss"),
    new ParseDate("SSS"),
    new StrMinMax(0, 100),
    new StrMinMax(0, 100),
    new StrMinMax(0, 100),
    new ParseInt(),
    new ParseInt(),
    new ParseDouble(),
    new ParseDouble(),
    new ParseDouble(),
    new StrMinMax(0, 100),
    new StrMinMax(0, 100),
};

public String[] header = {"Date", "Time", "Msec", "Application", "Window", "Message", "X", "Y", "RelDist", "TotalDist", "Rate", "Extra1", "Extra2"}; 

public ULogParser(String uLogFileLocation, String screenRecorderFileLocation)
{
    this.uLogFileLocation = uLogFileLocation;
    this.screenRecorderFileLocation = screenRecorderFileLocation;
}

public void parse()
{
    try {
        ICsvBeanReader reader = new CsvBeanReader(new BufferedReader(new FileReader(uLogFileLocation)), CsvPreference.STANDARD_PREFERENCE);
        reader.getCSVHeader(false); //parse past the header
        Entry entry;
        entry = reader.read(Entry.class, header, cellProcessor);
        System.out.println(entry.Application);
    } catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
}

public void sendToDB()
{
    Query query = new Query();
}
}

And the code for the Entry class:

public class Entry
{
private Date Date;
private Date Time;
private Date Msec;
private String Application;
private String Window;
private String Message;
private int X;
private int Y;
private double RelDist;
private double TotalDist;
private double Rate;
private String Extra1;
private String Extra2;

public Date getDate() { return Date; }
public Date getTime() { return Time; }
public Date getMsec() { return Msec; }
public String getApplication() { return Application; }
public String getWindow() { return Window; }
public String getMessage() { return Message; }
public int getX() { return X; }
public int getY() { return Y; }
public double getRelDist() { return RelDist; }
public double getTotalDist() { return TotalDist; }
public double getRate() { return Rate; }
public String getExtra1() { return Extra1; }
public String getExtra2() { return Extra2; }

public void setDate(Date Date) { this.Date = Date; }
public void setTime(Date Time) { this.Time = Time; }
public void setMsec(Date Msec) { this.Msec = Msec; }
public void setApplication(String Application) { this.Application = Application; }
public void setWindow(String Window) { this.Window = Window; }
public void setMessage(String Message) { this.Message = Message; }
public void setX(int X) { this.X = X; }
public void setY(int Y) { this.Y = Y; }
public void setRelDist(double RelDist) { this.RelDist = RelDist; }
public void setTotalDist(double TotalDist) { this.TotalDist = TotalDist; }
public void setRate(double Rate) { this.Rate = Rate; }
public void setExtra1(String Extra1) { this.Extra1 = Extra1; }
public void setExtra2(String Extra2) { this.Extra2 = Extra2; }

public Entry(){}
}

And the exception I'm receiving (note this is a different line than my above example, missing both of the last two columns):

Exception in thread "main" The value array (size 12)  must match the processors array (size 13): You are probably reading a CSV line with a different number of columns than the number of cellprocessors specified context: Line: 2 Column: 0 Raw line:
[2012:07:25, 11:48:05, 740, uLog.exe,  , Logging started, -1, -1, -1.00, -1.00, -1.00, ]
 offending processor: null
    at org.supercsv.util.Util.processStringList(Unknown Source)
    at org.supercsv.io.CsvBeanReader.read(Unknown Source)
    at processing.ULogParser.parse(ULogParser.java:59)
    at ui.ParseImplicitData.main(ParseImplicitData.java:15)

Yes, writing all those getters and setters was a pain in the ass. Also, I apologize, I probably don't have perfect convention in my use of SuperCSV (like what CellProcessor to use if you just want the unmodified String), but you get the idea. Also, this code is obviously not complete. For now, I'm just trying to successfully retrieve a line of data.

At this point, I'm wondering if using the CsvBeanReader is possible for my purposes. If not, I'm a little disappointed, since the CsvListReader (I would post hyperlink, but StackOverflow isn't allowing me too, also dumb) is just about as easy as not using the API at all, and just using Scanner.next().

Any help would be appreciated. Thanks in advance!

解决方案

Edit: Update for Super CSV 2.0.0-beta-1

Please note the API has changed in Super CSV 2.0.0-beta-1 (the code example is based on 1.52). The getCSVHeader() method on all readers is now getHeader() (to be in line with writeHeader on the writers).

Also, SuperCSVException has been renamed to SuperCsvException.


Edit: Update for Super CSV 2.1.0

Since version 2.1.0 it's possible to execute the cell processors after reading a line of CSV by using the new executeProcessors() method. For more information see this example on the project website. Please note this is only relevant for CsvListReader, as it's the only reader that allows for variable column length.


You're correct - CsvBeanReader doesn't support CSV files with a variable number of columns. According to most CSV specifications (including RFC 4180), the number of columns must be the same on every row.

For this reason (as a Super CSV developer) I'm reluctant to add this functionality to Super CSV. If you can think of an elegant way to add it then feel free to make suggestions on the project's SourceForge site. It would probably mean a new reader that extends upon CsvBeanReader: it would have to split the reading and mapping/processing into two separate methods (you can't do any processing or mapping to fields of the bean unless you know how many columns there are).

Simple solution

The simple solution to this (if you have control of the CSV file you're working with) is to simply add a blank column when writing your CSV file (the first line in your example would have a comma at the end - to indicate the last column is empty). That way, your CSV file will be valid (it will have the same number of columns on every row) and you can use CsvBeanReader as you're already doing.

If that's not possible, then all is not lost!

Fancy solution

As you probably realize, CsvBeanReader uses the name mapping to associate each column in the CSV file with a field in your bean, and the CellProcessor array to process each column. In other words, you have to know how many columns there are (and what they represent) if you want to use it.

CsvListReader, on the other hand, is very primitive and can read rows of varying length (because it doesn't need to process or map them).

So you can combine all the features of CsvBeanReader with CsvListReader (as done in the following example) by reading the file with both readers in parallel: using CsvListReader to figure out how many columns there are, and CsvBeanReader to do the processing/mapping.

Note that this makes the assumption that it's only ever the birthDate column that may not be present (i.e. it wouldn't work if you can't tell which column is missing).

package example;

import java.io.StringReader;
import java.util.Date;

import org.supercsv.cellprocessor.ParseDate;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.exception.SuperCSVException;
import org.supercsv.io.CsvBeanReader;
import org.supercsv.io.CsvListReader;
import org.supercsv.io.ICsvBeanReader;
import org.supercsv.io.ICsvListReader;
import org.supercsv.prefs.CsvPreference;

public class VariableColumns {

    private static final String INPUT = "name,birthDate,city\n"
        + "John,New York\n" 
        + "Sally,22/03/1974,London\n" 
        + "Jim,Sydney";

    // cell processors
    private static final CellProcessor[] NORMAL_PROCESSORS = 
    new CellProcessor[] {null, new ParseDate("dd/MM/yyyy"), null };
    private static final CellProcessor[] NO_BIRTHDATE_PROCESSORS = 
    new CellProcessor[] {null, null };

    // name mappings
    private static final String[] NORMAL_HEADER = 
    new String[] { "name", "birthDate", "city" };
    private static final String[] NO_BIRTHDATE_HEADER = 
    new String[] { "name", "city" };

    public static void main(String[] args) {

        // using bean reader and list reader together (to read the same file)
        final ICsvBeanReader beanReader = new CsvBeanReader(new StringReader(
                INPUT), CsvPreference.STANDARD_PREFERENCE);
        final ICsvListReader listReader = new CsvListReader(new StringReader(
                INPUT), CsvPreference.STANDARD_PREFERENCE);

        try {
            // skip over header
            beanReader.getCSVHeader(true);
            listReader.getCSVHeader(true);

            while (listReader.read() != null) {

                final String[] nameMapping;
                final CellProcessor[] processors;

                if (listReader.length() == NORMAL_HEADER.length) {
                    // all columns present - use normal header/processors
                    nameMapping = NORMAL_HEADER;
                    processors = NORMAL_PROCESSORS;

                } else if (listReader.length() == NO_BIRTHDATE_HEADER.length) {
                    // one less column - birth date must be missing
                    nameMapping = NO_BIRTHDATE_HEADER;
                    processors = NO_BIRTHDATE_PROCESSORS;

                } else {
                    throw new SuperCSVException(
                            "unexpected number of columns: "
                                    + listReader.length());
                }

                // can now use CsvBeanReader safely 
                // (we know how many columns there are)
                Person person = beanReader.read(Person.class, nameMapping,
                        processors);

                System.out.println(String.format(
                        "Person: name=%s, birthDate=%s, city=%s",
                        person.getName(), person.getBirthDate(),
                        person.getCity()));

            }
        } catch (Exception e) {
            // handle exceptions here
            e.printStackTrace();
        } finally {
            // close readers here
        }
    }

    public static class Person {

        private String name;
        private Date birthDate;
        private String city;

        public String getName() {
            return name;
        }

        public void setName(String name) {
            this.name = name;
        }

        public Date getBirthDate() {
            return birthDate;
        }

        public void setBirthDate(Date birthDate) {
            this.birthDate = birthDate;
        }

        public String getCity() {
            return city;
        }

        public void setCity(String city) {
            this.city = city;
        }
    }

}

I hope this helps.

Oh, and is there any reason why the fields in your Entry class don't follow normal naming conventions (camelCase)? If you update your header array to use camelcase, then your fields can be camelcase as well.

这篇关于使用CsvBeanReader读取具有可变列数的CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆