如何读取多个csv并合并 [英] how to read multiple csv and merge

查看:124
本文介绍了如何读取多个csv并合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有39个csv文件,有很大的内存大小。我想通过Java加载这个文件并设置为一个变量。下面是我的编码,它适用于小尺寸文件,但不适用于大尺寸文件。文件大小通常在100mb到800mb之间。我想在目录中加载39个文件,并将它们放入一个2d数组。

i have 39 csv files which have a lot of memory size. I want to load this file by Java and set as one variable. Below paragraph is my coding which works for small size file, but doesn't work for large size file. Size of file is usually around 100mb to 800mb. I want to load 39 file in directory and put them into one 2d array.

public static String readCSV(File csvFile) {
    BufferedReader bufferedReader = null;
    StringBuffer stringBuffer = new StringBuffer();

    try {
        bufferedReader = new BufferedReader(new FileReader(csvFile));
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }

    try {
        String temp = null;
        while((temp = bufferedReader.readLine()) != null) {
            stringBuffer.append(temp+","); // temp 에 저장되어있는 한 줄을 더한다.
        }

        System.out.println(stringBuffer);
    } catch (IOException e) {
        e.printStackTrace();
    }

    // -10,-9,-8,-7,-6,-5,-4,-3,-2,-1,0,,,,,,,,,,1,2,3,4,5,6,7,8,9,10, 반환
    return stringBuffer.toString();
}

public static String[] parse(String str) {
    String[] strArr = str.split(","); // 쉼표가 1개인 것을 기준으로 나누어서 배열에 저장

    return strArr; 
}

public static void main(String[] args) throws IOException {

    //mergeCsvFiles("sample", 4, "D:\\sample_folder\\" + "merge_file" + ".csv");


    String str = readCSV(new File("D:/sample_folder/sample1.csv"));
    String[] strArr = parse(str); // String 배열에 차곡차곡 담겨서 나온다.
    int varNumber = 45;
    int rowNumber = strArr.length/varNumber;

    String[][] Array2D = new String[varNumber][rowNumber];
    for(int j=0;j<varNumber;j++)
    {
        for(int i=0; i<rowNumber;i++)   
            {
                String k = strArr[i*varNumber+j];
                        Array2D[j][i]= k;
        }
    }                       //2D array 배열을 만들기      

    //String[][] naArray2D=removeNA(Array2D,rowNumber,varNumber); //NA 포함한 행 지우기





//      /*  제대로 제거 됐는지 확인하는 코드
    for(int i=0;i<varNumber;i++){
        for(int j=0;j<16;j++){
                            System.out.println(Array2D[i][j]);
        }
                        System.out.println("**********************NA제거&2차원 배열**********************");
    }           
//      */

    }
}


推荐答案

你提到的文件大小,你可能会

With the file sizes you are mentioning, you either are going to likely run out of memory in the JVM.

这可能是为什么你的最大的文件800 MB不加载到内存。不仅您将800MB加载到内存中,而且还会增加您正在使用的数组 s 的开销。换句话说,您使用的是1600MB + 所有每个数组的额外开销成本,这变得相当大

This is probably why your largest file of 800 some MB isn't loading into memory. Not only are you loading that 800MB into memory, but you are also adding the overhead of the arrays that you are using. In other words, you're using 1600MB + all of the extra overhead cost of each array, which becomes sizeable.

我打赌,你是超过内存限制,假设文件格式是完美的病例。虽然我不能确认,因为我不知道你的JVM,你的内存消耗,也没有必要的资产来说明这一点,这是由你决定是否是这种情况。

My bet is that you are exceeding memory limits under the assumption that file format is perfect in both cases. While I cannot confirm as I do not know your JVM, your memory consumption, nor have the required assets to figure any of this out, it is up to you to decide whether or not that is the case.

另外,我不知道 - 也许我正在读你的代码,但它似乎不会做我想你想要做的。也许我错了,我不知道你想要做什么。

Also, I don't know - maybe I'm reading your code right, but it doesn't seem like it's going to do what I think you want it to do. Maybe I'm wrong, I don't know exactly what you're trying to do.

这篇关于如何读取多个csv并合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆