如何使用supercsv跳过仅有空格的行和具有可变列的行 [英] How do I skip white-space only lines and lines having variable columns using supercsv

查看:343
本文介绍了如何使用supercsv跳过仅有空格的行和具有可变列的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理CSV解析器要求,我使用supercsv解析器库。我的CSV文件可以有25列(由标签(|)分隔),最多可以有100k行和附加标题行。

I am working on CSV parser requirement and I am using supercsv parser library. My CSV file can have 25 columns(separated by tab(|)) and up to 100k rows with additional header row.

我想忽略空白行

我使用IcvBeanReader与名称映射(设置csv值为pojo)和字段处理器(处理验证)用于读取文件。

I am using IcvBeanReader with name mappings(to set csv values to pojo) and field processors(to handle validations) for reading a file.

我假设Supercsv IcvBeanReader将默认跳过空格行。

I am assuming that Supercsv IcvBeanReader will skip white space lines by default. But how to handle if a row contains less than 25 column numbers?

推荐答案

您可以通过编写自己的Tokenizer来轻松实现。

You can easily do this by writing your own Tokenizer.

例如,以下Tokenizer将具有与默认值相同的行为,但将跳过没有正确列数的任何行。 p>

For example, the following Tokenizer will have the same behaviour as the default one, but will skip over any lines that don't have the correct number of columns.

public class SkipBadColumnCountTokenizer extends Tokenizer {

    private final int expectedColumns;

    private final List<Integer> ignoredLines = new ArrayList<>();

    public SkipBadColumnCountTokenizer(Reader reader, 
            CsvPreference preferences, int expectedColumns) {
        super(reader, preferences);
        this.expectedColumns = expectedColumns;
    }

    @Override
    public boolean readColumns(List<String> columns) throws IOException {
        boolean moreInputExists;
        while ((moreInputExists = super.readColumns(columns)) && 
            columns.size() != this.expectedColumns){
            System.out.println(String.format("Ignoring line %s with %d columns: %s", getLineNumber(), columns.size(), getUntokenizedRow()));
            ignoredLines.add(getLineNumber());
        }

        return moreInputExists;

    }

    public List<Integer> getIgnoredLines(){
        return this.ignoredLines;
    }
}

使用此Tokenizer进行简单测试...

And a simple test using this Tokenizer...

@Test
public void testInvalidRows() throws IOException {

    String input = "column1,column2,column3\n" +
            "has,three,columns\n" +
            "only,two\n" +
            "one\n" +
            "three,columns,again\n" +
            "one,too,many,columns";

    CsvPreference preference = CsvPreference.EXCEL_PREFERENCE;
    int expectedColumns = 3;
    SkipBadColumnCountTokenizer tokenizer = new SkipBadColumnCountTokenizer(
        new StringReader(input), preference, expectedColumns);

    try (ICsvBeanReader beanReader = new CsvBeanReader(tokenizer, preference)) {
        String[] header = beanReader.getHeader(true);
        TestBean bean;
        while ((bean = beanReader.read(TestBean.class, header)) != null){
            System.out.println(bean);
        }
        System.out.println(String.format("Ignored lines: %s", tokenizer.getIgnoredLines()));
    }

}

打印以下输出它会跳过所有无效的行):

Prints the following output (notice how it's skipped all of the invalid rows):

TestBean{column1='has', column2='three', column3='columns'}
Ignoring line 3 with 2 columns: only,two
Ignoring line 4 with 1 columns: one
TestBean{column1='three', column2='columns', column3='again'}
Ignoring line 6 with 4 columns: one,too,many,columns
Ignored lines: [3, 4, 6]

这篇关于如何使用supercsv跳过仅有空格的行和具有可变列的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆