Spring Batch如何在将重复的项目发送到ItemWriter之前对其进行过滤 [英] Spring Batch how to filter duplicated items before send it to ItemWriter

查看:191
本文介绍了Spring Batch如何在将重复的项目发送到ItemWriter之前对其进行过滤的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我读取了一个平面文件(例如,每个用户每行1行的.csv文件,例如: UserId; Data1; Date2 ).

I read a flat file (for example a .csv file with 1 line per User, Ex: UserId;Data1;Date2).

但是如何处理阅读器中的重复用户项(这里没有以前阅读过的用户列表...)

But how to handle duplicated User item in the reader (where is no list of previus readed users...)

stepBuilderFactory.get("createUserStep1")
.<User, User>chunk(1000)
.reader(flatFileItemReader) // FlatFileItemReader
.writer(itemWriter) // For example JDBC Writer
.build();

推荐答案

过滤通常是通过ItemProcessor完成的.如果ItemProcessor返回null,则该项目将被过滤且不会传递给ItemWriter.否则,是的.就您而言,您可以在ItemProcessor中保留以前见过的用户列表.如果以前未曾见过用户,请继续传递.如果以前已经看到过,则返回null.您可以在以下文档中阅读有关使用ItemProcessor进行过滤的更多信息:

Filtering is typically done with an ItemProcessor. If the ItemProcessor returns null, the item is filtered and not passed to the ItemWriter. Otherwise, it is. In your case, you could keep a list of previously seen users in the ItemProcessor. If the user hasn't been seen before, pass it on. If it has been seen before, return null. You can read more about filtering with an ItemProcessor in the documentation here: http://docs.spring.io/spring-batch/trunk/reference/html/readersAndWriters.html#filiteringRecords

/**
* This implementation assumes that there is enough room in memory to store the duplicate
* Users.  Otherwise, you'd want to store them somewhere you can do a look-up on.
*/
public class UserFilterItemProcessor implements ItemProcessor<User, User> {

    // This assumes that User.equals() identifies the duplicates
    private Set<User> seenUsers = new HashSet<User>();

    public User process(User user) {
        if(seenUsers.contains(user)) {
            return null;
        }
        seenUsers.add(user);
        return user;

    }
}

这篇关于Spring Batch如何在将重复的项目发送到ItemWriter之前对其进行过滤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆