Spring Batch如何在将重复的项目发送到ItemWriter之前对其进行过滤 [英] Spring Batch how to filter duplicated items before send it to ItemWriter
问题描述
我读取了一个平面文件(例如,每个用户每行1行的.csv文件,例如: UserId; Data1; Date2 ).
I read a flat file (for example a .csv file with 1 line per User, Ex: UserId;Data1;Date2).
但是如何处理阅读器中的重复用户项(这里没有以前阅读过的用户列表...)
But how to handle duplicated User item in the reader (where is no list of previus readed users...)
stepBuilderFactory.get("createUserStep1")
.<User, User>chunk(1000)
.reader(flatFileItemReader) // FlatFileItemReader
.writer(itemWriter) // For example JDBC Writer
.build();
推荐答案
过滤通常是通过ItemProcessor
完成的.如果ItemProcessor
返回null,则该项目将被过滤且不会传递给ItemWriter
.否则,是的.就您而言,您可以在ItemProcessor
中保留以前见过的用户列表.如果以前未曾见过用户,请继续传递.如果以前已经看到过,则返回null.您可以在以下文档中阅读有关使用ItemProcessor
进行过滤的更多信息:
Filtering is typically done with an ItemProcessor
. If the ItemProcessor
returns null, the item is filtered and not passed to the ItemWriter
. Otherwise, it is. In your case, you could keep a list of previously seen users in the ItemProcessor
. If the user hasn't been seen before, pass it on. If it has been seen before, return null. You can read more about filtering with an ItemProcessor
in the documentation here: http://docs.spring.io/spring-batch/trunk/reference/html/readersAndWriters.html#filiteringRecords
/**
* This implementation assumes that there is enough room in memory to store the duplicate
* Users. Otherwise, you'd want to store them somewhere you can do a look-up on.
*/
public class UserFilterItemProcessor implements ItemProcessor<User, User> {
// This assumes that User.equals() identifies the duplicates
private Set<User> seenUsers = new HashSet<User>();
public User process(User user) {
if(seenUsers.contains(user)) {
return null;
}
seenUsers.add(user);
return user;
}
}
这篇关于Spring Batch如何在将重复的项目发送到ItemWriter之前对其进行过滤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!