Univocity - 如何使用迭代器样式返回每行一个bean? [英] Univocity - How to return one bean per row using iterator style?
问题描述
我正在构建一个合并几个大型有序csv文件的过程。我目前正在研究使用Univocity来做到这一点。我设置合并的方法是使用实现类似接口的bean。
I am building a process to merge a few big sorted csv files. I am currently looking into using Univocity to do this. The way I setup the merge is to use beans that implement comparable interface.
简化文件看起来像这样:
The simplified file looks like this:
id,data
1,aa
2,bb
3,cc
bean看起来像这样(getter和setter ommited):
The bean looks like this (getters and setters ommited):
public class Address implements Comparable<Address> {
@Parsed
private int id;
@Parsed
private String data;
@Override
public int compareTo(Address o) {
return Integer.compare(this.getId(), o.getId());
}
}
比较器如下所示:
public class AddressComparator implements Comparator<Address>{
@Override
public int compare(Address a, Address b) {
if (a == null)
throw new IllegalArgumentException("argument object a cannot be null");
if (b == null)
throw new IllegalArgumentException("argument object b cannot be null");
return Integer.compare(a.getId(), b.getId());
}
}
因为我不想读取所有数据内存,我想读取每个文件的顶级记录并执行一些比较逻辑。这是我的简化示例:
As I do not want to read all the data in memory, I want to read the top record of each file and execute some compare logic. Here is my simplified example:
public class App {
private static final String INPUT_1 = "src/test/input/address1.csv";
private static final String INPUT_2 = "src/test/input/address2.csv";
private static final String INPUT_3 = "src/test/input/address3.csv";
public static void main(String[] args) throws FileNotFoundException {
BeanListProcessor<Address> rowProcessor = new BeanListProcessor<Address>(Address.class);
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.setRowProcessor(rowProcessor);
parserSettings.setHeaderExtractionEnabled(true);
CsvParser parser = new CsvParser(parserSettings);
List<FileReader> readers = new ArrayList<>();
readers.add(new FileReader(new File(INPUT_1)));
readers.add(new FileReader(new File(INPUT_2)));
readers.add(new FileReader(new File(INPUT_3)));
// This parses all rows, but I am only interested in getting 1 row as a bean.
for (FileReader fileReader : readers) {
parser.parse(fileReader);
List<Address> beans = rowProcessor.getBeans();
for (Address address : beans) {
System.out.println(address.toString());
}
}
// want to have a map with the reader and the first bean object
// Map<FileReader, Address> topRecordofReader = new HashMap<>();
Map<FileReader, String[]> topRecordofReader = new HashMap<>();
for (FileReader reader : readers) {
parser.beginParsing(reader);
String[] row;
while ((row = parser.parseNext()) != null) {
System.out.println(row[0]);
System.out.println(row[1]);
topRecordofReader.put(reader, row);
// all done, only want to get first row
break;
}
}
}
}
问题
在上面的示例中,我如何以这样的方式解析它遍历每一行并返回每行一个bean,而不是解析整个文件?
Question
Given above example, how do I parse in such a way that it iterates over each row and returns a bean per row, instead of parsing the whole file?
我正在寻找这样的东西(这个不起作用的代码只是为了表明我正在寻找的那种解决方案):
I am looking for something like this (this not working code is just to indicate the kind of solution I am looking for):
for (FileReader fileReader : readers) {
parser.beginParsing(fileReader);
Address bean = null;
while (bean = parser.parseNextRecord() != null) {
topRecordofReader.put(fileReader, bean);
}
}
推荐答案
那里迭代读取而不是将所有内容加载到内存中的两种方法,第一种方法是使用 BeanProcessor
而不是 BeanListProcessor
:
There are two approaches to read iteratively instead of loading everything in memory, the first one is to use a BeanProcessor
instead of BeanListProcessor
:
settings.setRowProcessor(new BeanProcessor<Address>(Address.class) {
@Override
public void beanProcessed(Address address, ParsingContext context) {
// your code to process the each parsed object here!
}
要在没有回调的情况下迭代读取bean(并执行其他一些常见过程),我们创建了一个 CsvRoutines 类(扩展自 AbstractRoutines - 更多示例这里):
To read beans iteratively without a callback (and to perform some other common processes), we created a CsvRoutines class (which extends from AbstractRoutines - more examples here):
File input = new File("/path/to/your.csv")
CsvParserSettings parserSettings = new CsvParserSettings();
//...configure the parser
// You can also use TSV and Fixed-width routines
CsvRoutines routines = new CsvRoutines(parserSettings);
for (Address address : routines.iterate(Address.class, input, "UTF-8")) {
//process your bean
}
希望这有帮助!
这篇关于Univocity - 如何使用迭代器样式返回每行一个bean?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!