Java:使用 ArrayList 检查重复行上的 CSV 文件 [英] Java: check CSV file on duplicate lines using ArrayList

查看:18
本文介绍了Java:使用 ArrayList 检查重复行上的 CSV 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含此内容的 CSV 文件:

2017-10-29 00:00:00.0,"1005",-10227,0,0,0,332894,0,0,222,332894,222,332894 2017-10-209 0:00.0,"1010",-125529,0,0,0,420743,0,0,256,420743,256,420743 2017-10-29 00:00:00.0,"1,0,002",0,332894,0,0,222,332894,222,332894 2017-10-29 00:00:00.0,1013",-10625,0,0,-687,599098,9,50,733599098 2017-10-29 00:00:00.0,"1604",-1794.9,0,0,-3.99,4081.07,0,0,361,4081.07,361,4081.07

所以第 1 行和第 3 行是重复的.现在我想读入文件并在控制台中打印出重复的行.

我设置此 Java 代码读取文件并将其逐行放入 ArrayList.然后我创建了一个不可变的复制,遍历 ArrayList 并在 binarySearch 中使用 ArrayList 的不可变副本:

import java.io.BufferedReader;导入 java.io.FileNotFoundException;导入 java.io.FileReader;导入 java.io.IOException;导入 java.util.ArrayList;导入 java.util.Collections;导入 java.util.List;公共类读取验证文件 {公共静态无效主(字符串 [] args){列表<字符串>验证文件 = 新的 ArrayList<>();尝试(BufferedReader br = new BufferedReader(new FileReader(validation_small.csv")));){字符串线;while((line = br.readLine())!= null){验证文件.添加(行);}} catch (FileNotFoundException e) {//e.printStackTrace();System.out.println("找不到文件" + e.getMessage());} catch (IOException e) {e.printStackTrace();}列表<字符串>验证文件复制 = Collections.unmodifiableList(validationFile);for(字符串行:validationFile){int comp = Collections.binarySearch(validationFileCopy,line,new ComparatorLine());如果(补偿 <= 0){System.out.println(line);}}}}

比较器类:

import java.util.Comparator;公共类 ComparatorLine 实现了 Comparator{@覆盖公共 int 比较(字符串 s1,字符串 s2){返回 s1.compareToIgnoreCase(s2);}}

我希望打印这一行:

2017-10-29 00:00:00.0,"1005",-10227,0,0,0,332894,0,0,222,332894,222,332894

但我得到的输出是这样的:

2017-10-29 00:00:00.0,"1010",-125529,0,0,0,420743,0,0,256,420743,256,420743

你能帮我看看我做错了什么吗?我认为我的比较器还可以.我的有什么问题数组列表?

解决方案

其他答案正确指出您应该使用 Set 而不是 List.但是为了学习起见,让我们看看你的代码,看看你哪里出错了.

公共类 ReadValidationFile {公共静态无效主(字符串 [] args){列表<字符串>验证文件 = 新的 ArrayList<>();try(BufferedReader br = new BufferedReader(new FileReader("validation_small.csv"));){

不需要分号.

 字符串行;while((line = br.readLine())!= null){验证文件.添加(行);}

这一切都可以在一行中实现:
List验证文件 = Files.readAllLines(Paths.get("validation_small.csv"), "utf-8");

 } catch (FileNotFoundException e) {//e.printStackTrace();System.out.println("找不到文件" + e.getMessage());} catch (IOException e) {e.printStackTrace();}列表<字符串>验证文件复制 = Collections.unmodifiableList(validationFile);

实际上,这不是副本.它只是同一个列表的一个不可修改的视图.

 for(String line : validationFile){int comp = Collections.binarySearch(validationFileCopy,line,new ComparatorLine());

您也可以只搜索 validationFile 本身.但是,您正在调用仅适用于排序列表的 binarySearch,但您的列表未排序.请参阅 文档.

 if (comp <= 0){System.out.println(line);}

未找到时您正在打印(comp <= 0).如果搜索成功,它将返回一个非负数(comp >= 0).但另一个问题是,您正在为每个元素搜索整个列表,并且搜索显然总是会成功(也就是说,如果您的列表已排序).

省去所有的麻烦,改用 Set.并且,使用 Java 8 流,整个程序可以简化为以下内容:

public static void main(String[] args) 抛出异常 {设置<字符串>uniqueLines = new HashSet<>();Files.lines(Paths.get("", "utf-8")).filter(line -> !uniqueLines.add(line)).forEach(System.out::println);}

如果您在比较字符串时确实需要忽略大小写(从您给定的数据来看,它看起来没有任何区别,因为它只是数字),然后通过先大写然后小写来存储每个唯一的行.这种看似繁琐的技术是必要的,因为如果处理非英语文本,仅小写是不够的.equalsIgnoreCase 方法也这样做.

public static void main(String[] args) 抛出异常 {设置<字符串>uniqueLines = new HashSet<>();Files.lines(Paths.get("", "utf-8")).filter(line -> !uniqueLines.add(line.toUpperCase().toLowerCase())).forEach(System.out::println);}

I have a CSV file with this content:

2017-10-29 00:00:00.0,"1005",-10227,0,0,0,332894,0,0,222,332894,222,332894 2017-10-29 00:00:00.0,"1010",-125529,0,0,0,420743,0,0,256,420743,256,420743 2017-10-29 00:00:00.0,"1005",-10227,0,0,0,332894,0,0,222,332894,222,332894 2017-10-29 00:00:00.0,"1013",-10625,0,0,-687,599098,0,0,379,599098,379,599098 2017-10-29 00:00:00.0,"1604",-1794.9,0,0,-3.99,4081.07,0,0,361,4081.07,361,4081.07

So lines 1 and 3 are duplicates. Now I want to read the file in and print out duplicate lines in the console.

I set up this Java code reading the file in and throwing it line by line into an ArrayList. Then I create an immutable copy, loop through the ArrayList and in the binarySearch I use the immutable copy of the ArrayList:

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class ReadValidationFile {

public static void main(String[] args) {

    List<String> validationFile = new ArrayList<>();

    try(BufferedReader br = new BufferedReader(new FileReader("validation_small.csv"));){

        String line;
        while((line = br.readLine())!= null){
            validationFile.add(line);
        }

    } catch (FileNotFoundException e) {
        //e.printStackTrace();
        System.out.println("file not found " + e.getMessage());
    } catch (IOException e) {
        e.printStackTrace();
    }

    List<String> validationFileCopy = Collections.unmodifiableList(validationFile);

    for(String line : validationFile){
        int comp = Collections.binarySearch(validationFileCopy,line,new ComparatorLine());
        if (comp <= 0){
            System.out.println(line);
        }

    }
}
}

Comparator Class:

import java.util.Comparator;

public class ComparatorLine implements Comparator<String> {
@Override
public int compare(String s1, String s2) {
    return s1.compareToIgnoreCase(s2);
}
}

I expect this line to be printed:

2017-10-29 00:00:00.0,"1005",-10227,0,0,0,332894,0,0,222,332894,222,332894

But the output I get is this:

2017-10-29 00:00:00.0,"1010",-125529,0,0,0,420743,0,0,256,420743,256,420743

Can you help me please to see what I am doing wrong? My comparator I think is okay. What is wrong with my ArrayLists?

解决方案

The other answer(s) correctly state that you should be using Set instead of List. But for the sake of learning, let's have a look at your code and see where you went wrong.

public class ReadValidationFile {

public static void main(String[] args) {

    List<String> validationFile = new ArrayList<>();

    try(BufferedReader br = new BufferedReader(new FileReader("validation_small.csv"));){

Semicolon is unnecessary.

        String line;
        while((line = br.readLine())!= null){
            validationFile.add(line);
        }

This can all be achieved in just one line:
List<String> validationFile = Files.readAllLines(Paths.get("validation_small.csv"), "utf-8");

    } catch (FileNotFoundException e) {
        //e.printStackTrace();
        System.out.println("file not found " + e.getMessage());
    } catch (IOException e) {
        e.printStackTrace();
    }

    List<String> validationFileCopy = Collections.unmodifiableList(validationFile);

Actually, this is not a copy. It is just an unmodifiable view of the same list.

    for(String line : validationFile){
        int comp = Collections.binarySearch(validationFileCopy,line,new ComparatorLine());

You might as well just search validationFile itself. However, you are calling binarySearch which only works on sorted lists, but your list is not sorted. See documentation.

        if (comp <= 0){
            System.out.println(line);
        }

You are printing when it's not found (comp <= 0). If the search succeeds, it will return a non-negative number (comp >= 0). But another problem is that you are searching the whole list for each element, and the search will obviously always succeed (that is, if your list was sorted).

Save yourself all the trouble and use a Set instead. And, using Java 8 streams, the whole program can be reduced to the following:

public static void main(String[] args) throws Exception {
    Set<String> uniqueLines = new HashSet<>();
    Files.lines(Paths.get("", "utf-8"))
            .filter(line -> !uniqueLines.add(line))
            .forEach(System.out::println);
}

If you really need to ignore case when comparing strings (from your given data, it looks like it doesn't make any difference since it's just numbers), then store each unique line by first uppercasing and then lowercasing it. This apparently cumbersome technique is necessary because just lowercasing is not enough if dealing with non-English language text. The equalsIgnoreCase method also does this.

public static void main(String[] args) throws Exception {
    Set<String> uniqueLines = new HashSet<>();
    Files.lines(Paths.get("", "utf-8"))
            .filter(line -> !uniqueLines.add(line.toUpperCase().toLowerCase()))
            .forEach(System.out::println);
}

这篇关于Java:使用 ArrayList 检查重复行上的 CSV 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆