读取CSV文件Java的最快方法 [英] Fastest way to read a CSV file java

查看:2013
本文介绍了读取CSV文件Java的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试使用openCSV读取多个csv文件(大约20 MB),但是到目前为止,它的速度很慢.我试图读取我正在设计的4个要加载到堆中的csv文件.我想知道,是否还有其他方法可以在更短的时间内完成.

I´ve been trying to read several csv files (arround 20 MB) using openCSV, but so far it has been slow. Im trying to read 4 csv files which I´m loading into a heap, which I have designed. I was wondering, if there is any other way this could be done in a lot less time.

private Heap<VOMovingViolations> datosHeap; 

public void loadMovingViolations()
{
    Runtime garbage = Runtime.getRuntime(); 
    garbage.gc();
    try 
    {
        FileReader fileReaderMes1 = new FileReader(FECHAS[0]);
        FileReader fileReaderMes2 = new FileReader(FECHAS[1]); 
        FileReader fileReaderMes3 = new FileReader(FECHAS[2]); 
        FileReader fileReaderMes4 = new FileReader(FECHAS[3]); 
        CSVReader enero = new CSVReaderBuilder(fileReaderMes1).withSkipLines(1).build();
        CSVReader febrero = new CSVReaderBuilder(fileReaderMes2).withSkipLines(1).build();
        CSVReader marzo = new CSVReaderBuilder(fileReaderMes3).withSkipLines(1).build();
        CSVReader abril = new CSVReaderBuilder(fileReaderMes4).withSkipLines(1).build();

        String[] row; 


        while((row = enero.readNext()) != null)
        {
            int objectId = Integer.parseInt(row[0]); 
            int totalPaid = (int)Double.parseDouble(row[9]);
            short fi = Short.parseShort(row[8]);
            short penalty1 = Short.parseShort(row[10]);
            datosHeap.insert(new VOMovingViolations(objectId, totalPaid,  fi,  row[2], row[13],
                        row[12],row[14], row[15], row[4], row[3], penalty1));
        }

        while((row = febrero.readNext()) != null)
        {
            int objectId = Integer.parseInt(row[0]); 
            int totalPaid = (int)Double.parseDouble(row[9]);
            short fi = Short.parseShort(row[8]);
            short penalty1 = Short.parseShort(row[10]);
            datosHeap.insert(new VOMovingViolations(objectId, totalPaid,  fi,  row[2], row[13],
                        row[12],row[14], row[15], row[4], row[3], penalty1));
        }

        while((row = marzo.readNext()) != null)
        {
            int objectId = Integer.parseInt(row[0]); 
            int totalPaid = (int)Double.parseDouble(row[9]);
            short fi = Short.parseShort(row[8]);
            short penalty1 = Short.parseShort(row[10]);
            datosHeap.insert(new VOMovingViolations(objectId, totalPaid,  fi,  row[2], row[13],
                        row[12],row[14], row[15], row[4], row[3], penalty1));
        }

        while((row = abril.readNext()) != null)
        {
            int objectId = Integer.parseInt(row[0]); 
            int totalPaid = (int)Double.parseDouble(row[9]);
            short fi = Short.parseShort(row[8]);
            short penalty1 = Short.parseShort(row[10]);
            datosHeap.insert(new VOMovingViolations(objectId, totalPaid,  fi,  row[2], row[13],
                        row[12],row[14], row[15], row[4], row[3], penalty1));
        }

    } 
    catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } 


}

如果有人可以给我任何帮助或想法,我将不胜感激.

I would really appreciate any help or any idea someone could please give me.

推荐答案

tl; dr

读取20 MB的CSV文件并实例化每行一个对象,所花费的总时间少于 1秒.

您没有定义术语慢".所以我做了一个实验,一个随意的基准测试.

You did not define the term "slow". So I did an experiment, a casual benchmark test.

首先,我们创建一个20 MB的文件,其中包含40,000个Person记录.每个Person都具有第一个&法国的姓氏, UUID ,以及一些任意文字作为说明.数据在 CSV 文件中分为四列. ://en.wikipedia.org/wiki/UTF-8"rel =" nofollow noreferrer> UTF-8 .我使用了 Apache Commons CSV 库来编写和阅读.

First we create a 20 MB file of 40,000 Person records. Each Person holds a first & last name in French, a UUID, and some arbitrary text as a description. The data is written as four columns in a CSV file in UTF-8. I used the Apache Commons CSV library to write and read.

第二,读取该书面文件.每行数据都被读取到内存中,然后用于实例化和收集Person对象.

Secondly, this written file is read. Each row of data is read into memory, then used to instantiate and collect a Person object.

读取此文件并为每行实例化Person对象花费的总时间不到一秒钟.每行大约需要20K 纳秒.实际上,这包括读取文件两次,因为我们进行了一次扫描以计算数据的行数以设置所收集实例的初始容量.另外,我们正在将十六进制字符串输入解析为 UUID 的128位值,因此我们花了一些时间在数据处理上(不仅仅是阅读).

Reading this file, and instantiating Person object for each row took less than one second in total elapsed time. Each row takes about 20K nanoseconds. Actually, this includes reading the file twice, as we do a scan to count the number of rows of data to set initial capacity of the collected instances. Also, we are parsing a hex string input into the 128-bit value of a UUID, so we have some time spent on data-processing (not just reading).

这是Person类.

package work.basil.example;

import java.util.UUID;

public class Person
{
    // Static
   static public  String LOREM_IPSUM = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";

    // Member variables.
    public String givenName, surname, description;
    public UUID id;

    public Person ( String givenName , String surname , UUID id , String description)
    {
        this.givenName = givenName;
        this.surname = surname;
        this.id = id;
        this.description = description ;
    }

    @Override
    public String toString ()
    {
        return "Person{ " +
                "givenName='" + givenName + '\'' +
                " | surname='" + surname + '\'' +
                " | id='" + id + '\'' +
                " }";
    }
}

这是编写并读取20 MB文件的完整应用程序.请研究和批评,因为我对此轻描淡写.我没有仔细检查我的工作.

And here is the complete app that writes and then reads the 20 MB file. Please study and critique, as I whipped this up in a jiffy. I’ve not double-checked my work.

您将找到一个write方法和一个read方法. main方法同时调用和跟踪时间.

You will find a write method, and a read method. The main method calls both, and tracks time.

package work.basil.example;

import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVPrinter;
import org.apache.commons.csv.CSVRecord;

import java.io.BufferedReader;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.Duration;
import java.time.Instant;
import java.time.temporal.ChronoUnit;
import java.util.ArrayList;
import java.util.List;
import java.util.UUID;
import java.util.concurrent.ThreadLocalRandom;

public class CsvSpeed
{
    public List < Person > read ( Path path )
    {
        // TODO: Add a check for valid file existing.

        List < Person > list = List.of();  // Default to empty list.
        try
        {
            // Prepare list.
            int initialCapacity = ( int ) Files.lines( path ).count();
            list = new ArrayList <>( initialCapacity );

            // Read CSV file. For each row, instantiate and collect `DailyProduct`.
            BufferedReader reader = Files.newBufferedReader( path );
            Iterable < CSVRecord > records = CSVFormat.RFC4180.withFirstRecordAsHeader().parse( reader );
            for ( CSVRecord record : records )
            {
                String givenName = record.get( "givenName" );
                String surname = record.get( "surname" );
                UUID id = UUID.fromString( record.get( "id" ) );
                String description = record.get( "description" );
                // Instantiate `Person` object, and collect it.
                Person person = new Person( givenName , surname , id , description );
                list.add( person );
            }
        } catch ( IOException e )
        {
            e.printStackTrace();
        }
        return list;
    }

    public void write ( final Path path )
    {
        ThreadLocalRandom random = ThreadLocalRandom.current();
        try ( final CSVPrinter printer = CSVFormat.RFC4180.withHeader( "givenName" , "surname" , "id" , "description" ).print( path , StandardCharsets.UTF_8 ) ; )
        {
            int limit = 40_000;  // 40_000 yields about 20 MB of data.
            List < String > givenNames = List.of( "Adrien" , "Aimon" , "Alerion" , "Alexis" , "Alezan" , "Ancil" , "Andre" , "Antoine" , "Archard" , "Aurélien" , "Averill" , "Baptiste" , "Barnard" , "Bartelemy" , "Bastien" , "Baylee" , "Beale" , "Beau" , "Beaumont" , "Beauregard" , "Bellamy" , "Berger" , "Blaize" , "Blondel" , "Boyce" , "Bruce" , "Brunelle" , "Brys" , "Burcet" , "Burnell" , "Burrell" , "Byron" , "Canaan" , "Carden" , "Carolas" , "Cavell" , "Chace" , "Chanler" , "Chante" , "Chappel" , "Charles" , "Chasen" , "Chason" , "Chemin" , "Chene" , "Cher" , "Chevalier" , "Cheyne" , "Clément" , "Clemence" , "Corbin" , "Coty" , "Cygne" , "Damien" , "Dandre" , "Dariel" , "Darl" , "Dauphine" , "Davet" , "Dax" , "Dean" , "Delice" , "Delmon" , "Destin" , "Dominique" , "Donatien" , "Duke" , "Eliott" , "Elroy" , "Enzo" , "Erwan" , "Etalon" , "Ethan" , "Fabron" , "Ferrand" , "Filberte" , "Florent" , "Florian" , "Fontaine" , "Forest" , "Fortune" , "Franchot" , "Francois" , "Fraser" , "Frayne" , "Gaëtan" , "Gabin" , "Gage" , "Gaige" , "Garland" , "Garner" , "Gaston" , "Gauge" , "Gaylord" , "Germain" , "Germaine" , "German" , "Gervaise" , "Giles" , "Gilles" , "Gitan" , "Grosvener" , "Guifford" , "Guion" , "Guy" , "Guzman" , "Henri" , "Holland" , "Hugo" , "Hugues" , "Hyacinthe" , "Jérémy" , "Jacquan" , "Jacques" , "Jacquez" , "Janvier" , "Jardan" , "Jay" , "Jaye" , "Jehan" , "Jemond" , "Jocquez" , "Jonathan" , "Jules" , "Julien" , "Justus" , "Karoly" , "Lado" , "Lafayette" , "Lamond" , "Lancelin" , "Landis" , "Landry" , "Laron" , "Larrimore" , "Laurent" , "LaValle" , "Leandre" , "Leggett" , "Leonce" , "Leron" , "Leverett" , "Lilian" , "Loïc" , "Lorenzo" , "Louis" , "Lowell" , "Luc" , "Lucien" , "Lukas" , "Macaire" , "Mace" , "Mahieu" , "Maison" , "Malleville" , "Manneville" , "Mantel" , "Marc" , "Marcel" , "Marion" , "Marius" , "Markez" , "Markis" , "Marmion" , "Marquis" , "Marquise" , "Marshall" , "Martial" , "Maslin" , "Mason" , "Matheo" , "Mathias" , "Mathys" , "Matthieu" , "Maxence" , "Mayson" , "Mehdi" , "Merle" , "Merville" , "Montague" , "Montaigu" , "Monte" , "Montgomery" , "Montreal" , "Montrel" , "Moore" , "Morel" , "Mortimer" , "Nerville" , "Neuveville" , "Nicolas" , "Noë" , "Noah" , "Noe" , "Norman" , "Norville" , "Nouel" , "Olivier" , "Onfroi" , "Paien" , "Parfait" , "Parnell" , "Pascal" , "Patrice" , "Paul" , "Peppin" , "Percival" , "Percy" , "Pernell" , "Peverell" , "Philipe" , "Pierpont" , "Pierre" , "Pomeroy" , "Prewitt" , "Purvis" , "Quennell" , "Quentin" , "Quincey" , "Quincy" , "Quintin" , "Rémi" , "Rafaelle" , "Ranger" , "Raoul" , "Raphaël" , "Rapier" , "Rawlins" , "Ray" , "Raynard" , "Remi" , "René" , "Renard" , "Rene" , "Reule" , "Reynard" , "Robin" , "Romain" , "Rondel" , "Roy" , "Royal" , "Ruff" , "Rush" , "Russel" , "Rustin" , "Sabastien" , "Sacha" , "Salomon" , "Samuel" , "Satordi" , "Saville" , "Scoville" , "Sebastien" , "Sennett" , "Severin" , "Shant" , "Shantae" , "Sidney" , "Siffre" , "Simeon" , "Simon" , "Sinclair" , "Sofiane" , "Somer" , "Stephane" , "Sully" , "Sydney" , "Sylvain" , "Talbot" , "Talon" , "Telford" , "Tempest" , "Teppo" , "Théo" , "Thayer" , "Thibault" , "Thibaut" , "Thiery" , "Tiennan" , "Tiennot" , "Titouan" , "Toussaint" , "Travaris" , "Tyson" , "Urson" , "Vachel" , "Valentin" , "Valere" , "Vallis" , "Verdun" , "Victoir" , "Victor" , "Waltier" , "William" , "Wyatt" , "Yanis" , "Yann" , "Yves" , "Yvon" , "Zosime" , "Abrial" , "Abrielle" , "Abril" , "Adele" , "Alair" , "Alerion" , "Amee" , "Angelique" , "Annette" , "Antonella" , "Arian" , "Ariane" , "Armandina" , "Aubree" , "Aubrielle" , "Audra" , "Avril" , "Bella" , "Berneta" , "Bette" , "Blaise" , "Blanche" , "Blasa" , "Bonte" , "Brie" , "Brienne" , "Brigit" , "Cachay" , "Calice" , "Camille" , "Camylle" , "Caprice" , "Caressa" , "Caroline" , "Catin" , "Celesta" , "Celeste" , "Cera" , "Cerise" , "Chablis" , "Chalice" , "Chambray" , "Champagne" , "Chandell" , "Chaney" , "Chantal" , "Chante" , "Chanterelle" , "Chantile" , "Chantilly" , "Chantrice" , "Charla" , "Charlotte" , "Charmane" , "Chaton" , "Chemin" , "Chenetta" , "Cher" , "Chere" , "Cheri" , "Cheryl" , "Christine" , "Cidney" , "Cinderella" , "Claire" , "Claudette" , "Colette" , "Cordelle" , "Cydnee" , "Daeja" , "Daija" , "Daja" , "Damzel" , "Darelle" , "Darlene" , "Darselle" , "Dejanelle" , "Deleena" , "Delice" , "Demeri" , "Deni" , "Denise" , "Desgracias" , "Desire" , "Desiree" , "Destanee" , "Destiny" , "Dior" , "Domanique" , "Dominique" , "Elaina" , "Elaine" , "Elayna" , "Elise" , "Eloisa" , "Elyse" , "Emeline" , "Emmaline" , "Emmeline" , "Estella" , "Estrella" , "Etiennette" , "Evette" , "Fabienne" , "Fabrienne" , "Fanchon" , "Fancy" , "Fawna" , "Fayana" , "Fayette" , "Fifi" , "Fleur" , "Fleurette" , "Fontanna" , "Fosette" , "Francine" , "Frederique" , "Gabriel" , "Gabriele" , "Gabrielle" , "Gaby" , "Garcelle" , "Gena" , "Genie" , "Georgette" , "Germaine" , "Gervaise" , "Gitana" , "Harriet" , "Heloisa" , "Holland" , "Honnetta" , "Isabelle" , "Ivette" , "Ivonne" , "Jacqueena" , "Jacquetta" , "Jacquiline" , "Jacyline" , "Jaime" , "Jakqueline" , "Janeen" , "Janelly" , "Janina" , "Janiqua" , "Janique" , "Jannnelle" , "Jaquita" , "Jardena" , "Jeanetta" , "Jermaine" , "Jessamine" , "Jewel" , "Jewell" , "Joli" , "Jolie" , "Josephine" , "Jozephine" , "Julieta" , "Karessa" , "Karmaine" , "Klara" , "Laine" , "Lanelle" , "Laramie" , "Layne" , "Layney" , "Leala" , "Leonette" , "Lissette" , "Lizette" , "Lourdes" , "Lucienne" , "Ly" , "Lyla" , "Lysette" , "Madelaine" , "Malerie" , "Manette" , "Marais" , "Marcelle" , "Marché" , "Mardi" , "Margo" , "Marguerite" , "Marie" , "Marie Claude" , "Marie Frances" , "Marie Joelle" , "Marie Pascale" , "Marie Sophie" , "Marjolaine" , "Marquise" , "Marvella" , "Mathieu" , "Matisse" , "Maurelle" , "Maurissa" , "Mavis" , "Melisande" , "Michelle" , "Miette" , "Mignon" , "Mimi" , "Mirya" , "Monet" , "Moniqua" , "Monteen" , "Musetta" , "Myrlie" , "Nadeen" , "Nadia" , "Nadiyah" , "Naeva" , "Nanon" , "Natalle" , "Naudia" , "Nettie" , "Nicholas" , "Nicki" , "Nicky" , "Nicole" , "Nicolette" , "Nicolina" , "Nicolle" , "Nikolette" , "Ninette" , "Ninon" , "Noelle" , "Nycole" , "Odelette" , "Opaline" , "Orane" , "Orva" , "Page" , "Parisa" , "Parnel" , "Parris" , "Patrice" , "Peridot" , "Pippi" , "Prairie" , "Rachele" , "Rachelle" , "Racquel" , "Raphaelle" , "Raquelle" , "Remi" , "Renée" , "Renea" , "Renelle" , "Renita" , "Risette" , "Rochelle" , "Romy" , "Rosabel" , "Rosiclara" , "Ruba" , "Russhell" , "Saleena" , "Salina" , "Satin" , "Sedona" , "Serene" , "Shandelle" , "Shanta" , "Shante" , "Shariah" , "Sharita" , "Sharleen" , "Sheree" , "Shereen" , "Sherell" , "Sherice" , "Sherry" , "Sidnee" , "Sidney" , "Sidnie" , "Sidonie" , "Sinclaire" , "Solange" , "Solen" , "Sorrel" , "Suzette" , "Sydnee" , "Sydney" , "Tallis" , "Tempest" , "Toinette" , "Turquoise" , "Veronique" , "Vignette" , "Villette" , "Violeta" , "Virginie" , "Voleta" , "Vonny" );
            List < String > surnames = List.of( "Arceneau" , "Aucoin" , "Babin" , "Babineaux" , "Benoit" , "Bergeron" , "Bernard" , "Bertrand" , "Bessette" , "Blanc" , "Blanchard" , "Bonnet" , "Boucher" , "Bourg" , "Bourque" , "Boutin" , "Bouvier" , "Braud" , "Broussard" , "Brun" , "Chevalier" , "David" , "Depaul" , "Desmarais" , "Disney" , "Dubois" , "Dupont" , "Dupuis" , "Durand" , "Fortescue" , "Fournier" , "Garnier" , "Gaudet" , "Gillet" , "Gillette" , "Girard" , "Gravois" , "Grosvenor" , "Lambert" , "Landry" , "Laroche" , "Laurent" , "Lefevre" , "Leroy" , "Leveque" , "Lisle" , "Martin" , "Michel" , "Molyneux" , "Moreau" , "Morel" , "Neville" , "Pelletier" , "Petit" , "Prideux" , "Renard" , "Richard" , "Robert" , "Rousseau" , "Roux" , "Rufus" , "Simon" , "Thomas" );
            for ( int i = 1 ; i <= limit ; i++ )
            {
                String givenName = givenNames.get( random.nextInt( 0 , givenNames.size() ) );
                String surname = surnames.get( random.nextInt( 0 , surnames.size() ) );
                UUID id = UUID.randomUUID();
                String description = Person.LOREM_IPSUM;
                printer.printRecord( givenName , surname , id , description );
            }
        } catch ( IOException e )
        {
            e.printStackTrace();
        }
    }

    public static void main ( final String[] args )
    {
        // Launch the app.
        CsvSpeed app = new CsvSpeed();

        // Write.
        String when = Instant.now().truncatedTo( ChronoUnit.SECONDS ).toString().replace( ":" , "•" );
        Path pathOutput = Paths.get( "/Users/basilbourque/persons.csv" );
        app.write( pathOutput );
        System.out.println( "Writing file: " + pathOutput );

        // Read.
        long start = System.nanoTime();
        Path pathInput = Paths.get( "/Users/basilbourque/persons.csv" );
        List < Person > list = app.read( pathInput );
        long stop = System.nanoTime();

        // Time.
        long elapsed = ( stop - start );
        Duration d = Duration.ofNanos( elapsed );
        System.out.println( "Reading elapsed: " + d );
        System.out.println( "Reading took nanos per row: " + ( elapsed / list.size() ) );
        System.out.println( "nanos elapsed: " + elapsed + "  |  list.size: " + list.size() );
    }
}

运行时:

编写文件:/Users/basilbourque/persons.csv

Writing file: /Users/basilbourque/persons.csv

已阅读时间:PT0.857816234S

Reading elapsed: PT0.857816234S

每行阅读所需的纳米数:21445

Reading took nanos per row: 21445

经过的Nanos:857816234 | list.size:40000

nanos elapsed: 857816234 | list.size: 40000

技术栈:

  • Java 11.0.2 — Azul Systems(从OpenJDK构建)中的 Zulu
  • 在IntelliJ 2019.1内部运行
  • macOS Mojave
  • MacBook Pro(15英寸视网膜,2013年末)
  • 处理器:2.3 GHz Intel Core i7(4核,8超级)
  • 16 GB 1600 MHz DDR3
  • 存储:Apple内置的固态存储
  • Java 11.0.2 — Zulu by Azul Systems (built from OpenJDK)
  • Run inside IntelliJ 2019.1
  • macOS Mojave
  • MacBook Pro (Retina, 15-inch, Late 2013)
  • Processor: 2.3 GHz Intel Core i7 (4 cores, 8 hyper)
  • 16 GB 1600 MHz DDR3
  • Storage: Solid-state built-in by Apple

这篇关于读取CSV文件Java的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆