动态读取CSV文件 [英] Dynamically reading a CSV file

查看:148
本文介绍了动态读取CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

简介

我正在一个自动化项目中工作,以便学习Java和数据科学的新技巧(在非常简单的水平上),所有内容都是自学成才的.

I am working in an automation project in order to learn new tricks with java and data science (at the very easy level), everything self taught.

问题

这是我如何存储此数据的示例.csv文件.

Here is an example .csv file of how I store this data.

Date when obtained
Format for identifying the numbers below
data
.
.
.
.
data

我当前正在使用CSV.

CSV I am currently using.

2018/12/29
name,quantity,quality,realmQ,cost
Tejido,321 908,13.55,43.18,$15.98,
Ropa,195 045,20.55,45.93,$123.01,
Gorra de visera,126 561,17.43,42.32,$79.54,
Cerveza,80 109,3.37,17.93,$12.38,
Mercancías de playa,75 065,11.48,39.73,$105.93,
Bebidas alcohólicas,31 215,4.84,27.90,$32.29,
Artículos de cuero,19 098,23.13,44.09,$198.74,
Bolsas y carteras,7 754,23.09,41.34,$1 176.54,
2018/12/30
name,quantity,quality,realmQ,cost
Tejido,252 229,12.86,43.14,$18.87,
Ropa,132 392,18.09,46.02,$177.58,
Gorra de visera,87 676,14.42,42.46,$122.48,
Cerveza,44 593,2.72,17.79,$18.71,
Mercancías de playa,44 593,8.26,39.56,$200.78,
Bebidas alcohólicas,27 306,4.30,23.88,$31.95,
Artículos de cuero,16 147,21.08,43.91,$207.49,
Bolsas y carteras,6 552,21.11,40.59,$1 195.41,
2019/01/02
name,quantity,quality,realmQ,cost
Tejido,321 908,13.55,43.18,$15.98,
Ropa,195 045,20.55,45.93,$123.01,
Gorra de visera,126 561,17.43,42.32,$79.54,
Cerveza,80 109,3.37,17.93,$12.38,
Mercancías de playa,75 065,11.48,39.73,$105.93,
Bebidas alcohólicas,31 215,4.84,27.90,$32.29,
Artículos de cuero,19 098,23.13,44.09,$198.74,
Bolsas y carteras,7 754,23.09,41.34,$1 176.54,
2019/01/03
name,quantity,quality,realmQ,cost
Tejido,321 908,13.55,43.18,$15.98,
Ropa,195 045,20.55,45.93,$123.01,
Gorra de visera,126 561,17.43,42.32,$79.54,
Cerveza,80 109,3.37,17.93,$12.38,
Mercancías de playa,75 065,11.48,39.73,$105.93,
Bebidas alcohólicas,31 215,4.84,27.90,$32.29,
Artículos de cuero,19 098,23.13,44.09,$198.74,
Bolsas y carteras,7 754,23.09,41.34,$1 176.54,

我想使其更具活力并且更大.我决定使用一个大的.csv文件来存储所有内容,而不是按日期分类的多个.csv文件,这就是结果.

I want to make it dynamic and also bigger. Instead of multiple .csv files classified by date I decided to have one big .csv file to store everything and that is the result.

到目前为止,我使用的代码只能读取一个.csv,但是如果我在下面添加更多数据.它不起作用.我知道这与调试器中看到的循环有关,但是仍然找不到正确的解决方案.

The code I used so far can read a single .csv but if I add more data below. It doesn't work. I know it is something related with the loop as I see in the debugger, but still can't find the right solution.

代码

public class CSVinput {

    static String[] nombre = new String[8];
    static int[] cantidad = new int[8];
    static double[] calidad = new double[8];
    static double[] realmQ = new double[8];
    static double[] coste = new double[8];    

public static void ImportData(String path) throws FileNotFoundException
{
    /*Can only load one csv with 8 stuff in it*/
    System.out.println("Presenting data...");


        try (Scanner scan = new Scanner(new File(path))) {
            scan.useDelimiter(",");
            String date = scan.nextLine();
            System.out.println("fecha: " + date);
            scan.nextLine();

            int index = 0;
            while(scan.hasNext() == true)
                try{
                {                    
                    String name = scan.next().replaceAll("\n", "");
                        nombre[index] = name;
                    System.out.println("nombre: " + name);
                    int quantity = Integer.parseInt(scan.next().replaceAll(" ", ""));
                        cantidad[index] = quantity;
                    System.out.println("cantidad: " + quantity);
                    double quality = Double.parseDouble(scan.next());
                        calidad[index] = quality;
                    System.out.println("calidad: " + quality);
                    double realmq = Double.parseDouble(scan.next());
                        realmQ[index] = realmq;
                    System.out.println("realmQ: " + realmq);
                    double cost = Double.parseDouble(scan.next().replace("$", "").replace(" ", ""));
                        coste[index] = cost;
                    System.out.println("coste: $" + cost);

                    index++;                    
                }
                } catch(ArrayIndexOutOfBoundsException e){}    
        }     
}

   public static void main(String[] args) throws FileNotFoundException
         {
             ImportData("caca.csv");             
         }
}

注释

发布的这段代码是与单个.csv一起使用的代码,这意味着您需要输入此代码,并且代码也应该拆分"数据,以便于使用.

This code posted is the one that works with a single .csv and that means you need to input this and the code should "split" the data too make it easy to work with.

2018/12/29
name,quantity,quality,realmQ,cost
Tejido,321 908,13.55,43.18,$15.98,
Ropa,195 045,20.55,45.93,$123.01,
Gorra de visera,126 561,17.43,42.32,$79.54,
Cerveza,80 109,3.37,17.93,$12.38,
Mercancías de playa,75 065,11.48,39.73,$105.93,
Bebidas alcohólicas,31 215,4.84,27.90,$32.29,
Artículos de cuero,19 098,23.13,44.09,$198.74,
Bolsas y carteras,7 754,23.09,41.34,$1 176.54,

我期望的结果

如果我在前一个(追加)的下方添加更多.csv数据,则无论.csv有多大,我都希望读取它.

Was that if I add more .csv data below the previous one (appended), I want it to read it, no matter how big is the .csv

感谢您对此问题的关注.

Thanks for the interest in this question.

推荐答案

CSV➙平面表

发明了 CSV格式,以表示单个简单的数据表. 制表符分隔文件的同上.

CSV ➙ flat table

The CSV format was invented to represent a single simple flat table of data. Ditto for Tab-delimited files.

您具有日期层次结构,该层次结构映射到名称-数量-质量-领域Q-成本元组的集合.那是不是简单的平面表格数据.

You have a hierarchy of a date mapping to a collection of name-quantity-quality-realmQ-cost tuples. That is not simple flat tabular data.

如果要将其存储为CSV,则必须通过添加日期列并在元组,以成为日期-名称-数量-质量-领域Q成本元组.

If you want to store that in CSV, you must flatten by adding a column for the date and repeating the date value across the collection of tuples, to become date-name-quantity-quality-realmQ-cost tuples.

date,name,quantity,quality,realmQ,cost
2018-12-29,Tejido,321 908,13.55,43.18,$15.98
2018-12-29,Ropa,195 045,20.55,45.93,$123.01
2018-12-29,Gorra de visera,126 561,17.43,42.32,$79.54
2018-12-29,Cerveza,80 109,3.37,17.93,$12.38
2018-12-29,Mercancías de playa,75 065,11.48,39.73,$105.93
2018-12-29,Bebidas alcohólicas,31 215,4.84,27.90,$32.29
2018-12-29,Artículos de cuero,19 098,23.13,44.09,$198.74
2018-12-29,Bolsas y carteras,7 754,23.09,41.34,$1 176.54

现在可以读取该数据并将其写入CSV文件.

That data could now be read and written to CSV files.

请注意您的分隔符.请注意,每行的最后一个字段后不应有逗号.

And watch your delimiters. Notice there should be no comma after the last field of each row.

Apache Commons CSV 库将执行CSV为您解析,阅读和编​​写.几次对我来说效果很好.

The Apache Commons CSV library will perform the CSV parsing, reading, and writing for you. It has worked well for me a few times.

让我们解析具有内容的扁平化版本的示例数据的data.csv文件.数据已清理:

Let’s parse a data.csv file with this content, with a flattened version of your example data. The data has been cleaned up:

  • 将日期切换为标准ISO 8601格式
  • 已消除空格字符(整数)
  • 删除了$字符
  • 删除每行末尾的多余逗号
  • 将产品名称翻译为英语(对于此英文版的Stack Overflow).
  • Switched dates to standard ISO 8601 format
  • Eliminated SPACE character in integer numbers
  • Removed $ character
  • Deleted the extra comma at end of each row
  • Translated the product names to English (for this English edition of Stack Overflow).
date,name,quantity,quality,realmQ,cost
2018-12-29,Fabric,321908,13.55,43.18,15.98
2018-12-29,Clothing,195045,20.55,45.93,123.01
2018-12-29,Visor Cap,126561,17.43,42.32,79.54
2018-12-29,Beer,80109,3.37,17.93,12.38
2018-12-29,Beach goods,75065,11.48,39.73,105.93
2018-12-29,Alcoholic beverages,31215,4.84,27.90,32.29
2018-12-29,Leather goods,19098,23.13,44.09,198.74
2018-12-29,Bags and wallets,7754,23.09,41.34,1176.54
2018-12-30,Fabric,252229,12.86,43.14,18.87
2018-12-30,Clothing,132392,18.09,46.02,177.58
2018-12-30,Visor Cap,87676,14.42,42.46,122.48
2018-12-30,Beer,44593,2.72,17.79,18.71
2018-12-30,Beach goods,44593,8.26,39.56,200.78
2018-12-30,Alcoholic beverages,27306,4.30,23.88,31.95
2018-12-30,Leather goods,16147,21.08,43.91,207.49
2018-12-30,Bags and wallets,6552,21.11,40.59,1195.41
2019-01-02,Fabric,321908,13.55,43.18,15.98
2019-01-02,Clothing,195045,20.55,45.93,123.01
2019-01-02,Visor Cap,126561,17.43,42.32,79.54
2019-01-02,Beer,80109,3.37,17.93,12.38
2019-01-02,Beach goods,75065,11.48,39.73,105.93
2019-01-02,Alcoholic beverages,31215,4.84,27.90,32.29
2019-01-02,Leather goods,19098,23.13,44.09,198.74
2019-01-02,Bags and wallets,7754,23.09,41.34,1176.54
2019-01-03,Fabric,321908,13.55,43.18,15.98
2019-01-03,Clothing,195045,20.55,45.93,123.01
2019-01-03,Visor Cap,126561,17.43,42.32,79.54
2019-01-03,Beer,80109,3.37,17.93,12.38
2019-01-03,Beach goods,75065,11.48,39.73,105.93
2019-01-03,Alcoholic beverages,31215,4.84,27.90,32.29
2019-01-03,Leather goods,19098,23.13,44.09,198.74
2019-01-03,Bags and wallets,7754,23.09,41.34,1176.54

我们定义了一个用于容纳每个元组的类.

We define a class to hold each tuple.

package com.basilbourque.example;

import java.math.BigDecimal;
import java.time.LocalDate;
import java.util.Objects;

public class DailyProduct {
    // date,name,quantity,quality,realmQ,cost
    // 2018-12-29,Fabric,321908,13.55,43.18,15.98
    // 2018-12-29,Clothing,195045,20.55,45.93,123.01
    // 2018-12-29,Visor Cap,126561,17.43,42.32,79.54
    // 2018-12-29,Beer,80109,3.37,17.93,12.38
    // 2018-12-29,Beach goods,75065,11.48,39.73,105.93
    // 2018-12-29,Alcoholic beverages,31215,4.84,27.90,32.29
    // 2018-12-29,Leather goods,19098,23.13,44.09,198.74
    // 2018-12-29,Bags and wallets,7754,23.09,41.34,1176.54

    public enum Header {
        DATE, NAME, QUANTITY, QUALITY, REALMQ, COST;
    }

    // ----------|  Member vars  |-----------------------------------
    public LocalDate localDate;
    public String name;
    public Integer quantity;
    public BigDecimal quality, realmQ, cost;

    // ----------|  Constructor  |-----------------------------------
    public DailyProduct ( LocalDate localDate , String name , Integer quantity , BigDecimal quality , BigDecimal realmq , BigDecimal cost ) {
        this.localDate = Objects.requireNonNull( localDate );
        this.name = Objects.requireNonNull( name );
        this.quantity = Objects.requireNonNull( quantity );
        this.quality = Objects.requireNonNull( quality );
        this.realmQ = Objects.requireNonNull( realmq );
        this.cost = Objects.requireNonNull( cost );
    }

    // ----------|  `Object` overrides  |-----------------------------------
    @Override
    public String toString ( ) {
        return "com.basilbourque.example.DailyProduct{ " +
                "localDate=" + localDate +
                " | name='" + name + '\'' +
                " | quantity=" + quantity +
                " | quality=" + quality +
                " | realmq=" + realmQ +
                " | cost=" + cost +
                " }";
    }

    @Override
    public boolean equals ( Object o ) {
        if ( this == o ) return true;
        if ( o == null || getClass() != o.getClass() ) return false;
        DailyProduct that = ( DailyProduct ) o;
        return localDate.equals( that.localDate ) &&
                name.equals( that.name );
    }

    @Override
    public int hashCode ( ) {
        return Objects.hash( localDate , name );
    }

}

写一个类来读写包含DailyProduct对象数据的文件.

Write a class to read and write files containing the data of the DailyProduct objects.

package com.basilbourque.example;

import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVPrinter;
import org.apache.commons.csv.CSVRecord;

import java.io.BufferedReader;
import java.io.IOException;
import java.math.BigDecimal;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.Instant;
import java.time.LocalDate;
import java.time.temporal.ChronoUnit;
import java.util.ArrayList;
import java.util.List;

public class DailyProductFileHandler {
    public List < DailyProduct > read ( Path path ) {
        // TODO: Add a check for valid file existing.

        List < DailyProduct > list = List.of();  // Default to empty list.
        try {
            // Prepare list.
            int initialCapacity = ( int ) Files.lines( path ).count();
            list = new ArrayList <>( initialCapacity );

            // Read CSV file. For each row, instantiate and collect `DailyProduct`.
            BufferedReader reader = Files.newBufferedReader( path );
            Iterable < CSVRecord > records = CSVFormat.RFC4180.withFirstRecordAsHeader().parse( reader );
            for ( CSVRecord record : records ) {
                // date,name,quantity,quality,realmQ,cost
                LocalDate localDate = LocalDate.parse( record.get( "date" ) );
                String name = record.get( "name" );
                Integer quantity = Integer.valueOf( record.get( "quantity" ) );
                BigDecimal quality = new BigDecimal( record.get( "quality" ) );
                BigDecimal realmQ = new BigDecimal( record.get( "realmQ" ) );  // Note: case-sensitive.
                BigDecimal cost = new BigDecimal( record.get( "cost" ) );
                // Instantiate `DailyProduct` object, and collect it.
                DailyProduct dailyProduct = new DailyProduct( localDate , name , quantity , quality , realmQ , cost );
                list.add( dailyProduct );
            }
        } catch ( IOException e ) {
            e.printStackTrace();
        }
        return list;
    }

    public void write ( final List < DailyProduct > dailyProducts , final Path path ) {
        try ( final CSVPrinter printer = CSVFormat.RFC4180.withHeader( "date" , "name" , "quantity" , "quality" , "realmQ" , "cost" ).print( path , StandardCharsets.UTF_8 ) ; ) {
            for ( DailyProduct dp : dailyProducts ) {
                printer.printRecord( dp.localDate , dp.name , dp.quantity , dp.quality , dp.realmQ , dp.cost );
            }
        } catch ( IOException e ) {
            e.printStackTrace();
        }
    }

    public static void main ( final String[] args ) {
        DailyProductFileHandler fileHandler = new DailyProductFileHandler();

        Path pathInput = Paths.get( "/Users/basilbourque/data.csv" );
        List < DailyProduct > list = fileHandler.read( pathInput );
        System.out.println( list );

        String when = Instant.now().truncatedTo( ChronoUnit.SECONDS ).toString().replace( ":" , "•" );
        Path pathOutput = Paths.get( "/Users/basilbourque/data_" + when + ".csv" );
        fileHandler.write( list , pathOutput );
        System.out.println( "Writing file: " + pathOutput );
    }
}

运行时:

[com.basilbourque.example.DailyProduct {localDate = 2018-12-29 | name ='Fabric'|数量= 321908 |质量= 13.55 | realmq = 43.18 | cost = 15.98},com.basilbourque.example.DailyProduct {localDate = 2018-12-29 | name =服装" |数量= 195045 |质量= 20.55 | realmq = 45.93 | cost = 123.01},com.basilbourque.example.DailyProduct {localDate = 2018-12-29 | name =遮阳帽" |数量= 126561 |质量= 17.43 | realmq = 42.32 | cost = 79.54},com.basilbourque.example.DailyProduct {localDate = 2018-12-29 | name ='啤酒'|数量= 80109 |质量= 3.37 | realmq = 17.93 | cost = 12.38},com.basilbourque.example.DailyProduct {localDate = 2018-12-29 |名称=海滩商品" |数量= 75065 |质量= 11.48 | realmq = 39.73 | cost = 105.93},com.basilbourque.example.DailyProduct {localDate = 2018-12-29 |名称=酒精饮料" |数量= 31215 |质量= 4.84 | realmq = 27.90 | cost = 32.29},com.basilbourque.example.DailyProduct {localDate = 2018-12-29 |名称=皮革商品" |数量= 19098 |质量= 23.13 | realmq = 44.09 | cost = 198.74},com.basilbourque.example.DailyProduct {localDate = 2018-12-29 | name =手袋和钱包" |数量= 7754 |质量= 23.09 | realmq = 41.34 | cost = 1176.54},com.basilbourque.example.DailyProduct {localDate = 2018-12-30 | name ='Fabric'|数量= 252229 |质量= 12.86 | realmq = 43.14 | cost = 18.87},com.basilbourque.example.DailyProduct {localDate = 2018-12-30 | name =服装" |数量= 132392 |质量= 18.09 | realmq = 46.02 | cost = 177.58},com.basilbourque.example.DailyProduct {localDate = 2018-12-30 | name =遮阳帽" |数量= 87676 |质量= 14.42 | realmq = 42.46 | cost = 122.48},com.basilbourque.example.DailyProduct {localDate = 2018-12-30 | name ='啤酒'|数量= 44593 |质量= 2.72 | realmq = 17.79 | cost = 18.71},com.basilbourque.example.DailyProduct {localDate = 2018-12-30 |名称=海滩商品" |数量= 44593 |质量= 8.26 | realmq = 39.56 | cost = 200.78},com.basilbourque.example.DailyProduct {localDate = 2018-12-30 |名称=酒精饮料" |数量= 27306 |质量= 4.30 | realmq = 23.88 | cost = 31.95},com.basilbourque.example.DailyProduct {localDate = 2018-12-30 |名称=皮革商品" |数量= 16147 |质量= 21.08 | realmq = 43.91 | cost = 207.49},com.basilbourque.example.DailyProduct {localDate = 2018-12-30 | name =手袋和钱包" |数量= 6552 |质量= 21.11 | realmq = 40.59 | cost = 1195.41},com.basilbourque.example.DailyProduct {localDate = 2019-01-02 | name ='Fabric'|数量= 321908 |质量= 13.55 | realmq = 43.18 | cost = 15.98},com.basilbourque.example.DailyProduct {localDate = 2019-01-02 | name =服装" |数量= 195045 |质量= 20.55 | realmq = 45.93 | cost = 123.01},com.basilbourque.example.DailyProduct {localDate = 2019-01-02 | name =遮阳帽" |数量= 126561 |质量= 17.43 | realmq = 42.32 | cost = 79.54},com.basilbourque.example.DailyProduct {localDate = 2019-01-02 | name ='啤酒'|数量= 80109 |质量= 3.37 | realmq = 17.93 | cost = 12.38},com.basilbourque.example.DailyProduct {localDate = 2019-01-02 |名称=海滩商品" |数量= 75065 |质量= 11.48 | realmq = 39.73 | cost = 105.93},com.basilbourque.example.DailyProduct {localDate = 2019-01-02 |名称=酒精饮料" |数量= 31215 |质量= 4.84 | realmq = 27.90 | cost = 32.29},com.basilbourque.example.DailyProduct {localDate = 2019-01-02 |名称=皮革商品" |数量= 19098 |质量= 23.13 | realmq = 44.09 | cost = 198.74},com.basilbourque.example.DailyProduct {localDate = 2019-01-02 | name =手袋和钱包" |数量= 7754 |质量= 23.09 | realmq = 41.34 | cost = 1176.54},com.basilbourque.example.DailyProduct {localDate = 2019-01-03 | name ='Fabric'|数量= 321908 |质量= 13.55 | realmq = 43.18 | cost = 15.98},com.basilbourque.example.DailyProduct {localDate = 2019-01-03 | name =服装" |数量= 195045 |质量= 20.55 | realmq = 45.93 | cost = 123.01},com.basilbourque.example.DailyProduct {localDate = 2019-01-03 | name =遮阳帽" |数量= 126561 |质量= 17.43 | realmq = 42.32 | cost = 79.54},com.basilbourque.example.DailyProduct {localDate = 2019-01-03 | name ='啤酒'|数量= 80109 |质量= 3.37 | realmq = 17.93 | cost = 12.38},com.basilbourque.example.DailyProduct {localDate = 2019-01-03 |名称=海滩商品" |数量= 75065 |质量= 11.48 | realmq = 39.73 | cost = 105.93},com.basilbourque.example.DailyProduct {localDate = 2019-01-03 |名称=酒精饮料" |数量= 31215 |质量= 4.84 | realmq = 27.90 | cost = 32.29},com.basilbourque.example.DailyProduct {localDate = 2019-01-03 |名称=皮革商品" |数量= 19098 |质量= 23.13 | realmq = 44.09 | cost = 198.74},com.basilbourque.example.DailyProduct {localDate = 2019-01-03 | name =手袋和钱包" |数量= 7754 |质量= 23.09 | realmq = 41.34 |费用= 1176.54}]

[com.basilbourque.example.DailyProduct{ localDate=2018-12-29 | name='Fabric' | quantity=321908 | quality=13.55 | realmq=43.18 | cost=15.98 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-29 | name='Clothing' | quantity=195045 | quality=20.55 | realmq=45.93 | cost=123.01 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-29 | name='Visor Cap' | quantity=126561 | quality=17.43 | realmq=42.32 | cost=79.54 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-29 | name='Beer' | quantity=80109 | quality=3.37 | realmq=17.93 | cost=12.38 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-29 | name='Beach goods' | quantity=75065 | quality=11.48 | realmq=39.73 | cost=105.93 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-29 | name='Alcoholic beverages' | quantity=31215 | quality=4.84 | realmq=27.90 | cost=32.29 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-29 | name='Leather goods' | quantity=19098 | quality=23.13 | realmq=44.09 | cost=198.74 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-29 | name='Bags and wallets' | quantity=7754 | quality=23.09 | realmq=41.34 | cost=1176.54 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-30 | name='Fabric' | quantity=252229 | quality=12.86 | realmq=43.14 | cost=18.87 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-30 | name='Clothing' | quantity=132392 | quality=18.09 | realmq=46.02 | cost=177.58 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-30 | name='Visor Cap' | quantity=87676 | quality=14.42 | realmq=42.46 | cost=122.48 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-30 | name='Beer' | quantity=44593 | quality=2.72 | realmq=17.79 | cost=18.71 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-30 | name='Beach goods' | quantity=44593 | quality=8.26 | realmq=39.56 | cost=200.78 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-30 | name='Alcoholic beverages' | quantity=27306 | quality=4.30 | realmq=23.88 | cost=31.95 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-30 | name='Leather goods' | quantity=16147 | quality=21.08 | realmq=43.91 | cost=207.49 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-30 | name='Bags and wallets' | quantity=6552 | quality=21.11 | realmq=40.59 | cost=1195.41 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-02 | name='Fabric' | quantity=321908 | quality=13.55 | realmq=43.18 | cost=15.98 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-02 | name='Clothing' | quantity=195045 | quality=20.55 | realmq=45.93 | cost=123.01 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-02 | name='Visor Cap' | quantity=126561 | quality=17.43 | realmq=42.32 | cost=79.54 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-02 | name='Beer' | quantity=80109 | quality=3.37 | realmq=17.93 | cost=12.38 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-02 | name='Beach goods' | quantity=75065 | quality=11.48 | realmq=39.73 | cost=105.93 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-02 | name='Alcoholic beverages' | quantity=31215 | quality=4.84 | realmq=27.90 | cost=32.29 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-02 | name='Leather goods' | quantity=19098 | quality=23.13 | realmq=44.09 | cost=198.74 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-02 | name='Bags and wallets' | quantity=7754 | quality=23.09 | realmq=41.34 | cost=1176.54 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-03 | name='Fabric' | quantity=321908 | quality=13.55 | realmq=43.18 | cost=15.98 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-03 | name='Clothing' | quantity=195045 | quality=20.55 | realmq=45.93 | cost=123.01 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-03 | name='Visor Cap' | quantity=126561 | quality=17.43 | realmq=42.32 | cost=79.54 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-03 | name='Beer' | quantity=80109 | quality=3.37 | realmq=17.93 | cost=12.38 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-03 | name='Beach goods' | quantity=75065 | quality=11.48 | realmq=39.73 | cost=105.93 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-03 | name='Alcoholic beverages' | quantity=31215 | quality=4.84 | realmq=27.90 | cost=32.29 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-03 | name='Leather goods' | quantity=19098 | quality=23.13 | realmq=44.09 | cost=198.74 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-03 | name='Bags and wallets' | quantity=7754 | quality=23.09 | realmq=41.34 | cost=1176.54 }]

编写文件:/Users/basilbourque/data_2019-01-05T03•48•37Z.csv

Writing file: /Users/basilbourque/data_2019-01-05T03•48•37Z.csv

ISO 8601

在将日期时间值序列化为文本时,请始终使用标准 ISO 8601 格式.对于没有日期且没有时区的仅日期值,该值为YYYY-MM-DD.

ISO 8601

By the way, when serializing date-time values to text, always use the standard ISO 8601 formats. For a date-only value without time-of-day and without time zone, that would be YYYY-MM-DD.

如果要保留层次结构,请使用 CSV 以外的其他文件格式.通常是 XML JSON 用于此类数据.

If you want to preserve the hierarchy, use some file format other than CSV. Commonly XML or JSON is used for such data.

您的问题不能提供足够的细节来确定,但是我觉得您应该使用数据库,而不是文本文件.如果您正在读取,编辑和附加新数据,以获取大量数据(足够大意味着足以影响内存限制),或者正在使用多个进程/线程/用户,则需要一个数据库.数据库旨在有效处理太大而无法完全容纳到内存中的数据.并且数据库被设计为处理并发访问.

Your Question does not provide enough detail to know for certain, but I get the feeling you should be using a database rather than text files. If you are reading, editing, and appending new data, for large amounts of data (large meaning enough to be concerned about impacting memory limits) or you are using multiple processes/threads/users, then a database is called for. A database is designed to efficiently handle data too large to fit entirely into memory. And a database is designed to handle concurrent access.

我必须从csvFile中解析数据,现在我正在处理少量的10种产品,但是如果我使用的是100种甚至上千种

I have to parse data from a csvFile, right now I am working with small numbers, 10 products but if I work with, let's say, a 100 or even a thousand

那不是您所说的大"字.甚至 Raspberry Pi RAM 将数千个这样的元组加载到内存中.

That is not "large" as you put it. Even a Raspberry Pi or Beaglebone Black has enough RAM to load several thousand of such tuples into memory.

或其他任何我希望程序创建动态数组的数字,因此我不必每次想要输入数据时都手动更改数组尺寸.

or any other number I want the program to make a dynamic array, so I do not have to manually change the array dimension manually every time I want to input data.

您需要学习 Java集合框架,而不是使用简单的数组.

You need to learn about Java Collections Framework, rather than using simple arrays.

特别是,通常使用也被某些人称为字典 ).此数据结构是键-值对的集合,其中日期为您的

In particular, your date-to-tuple hierarchy would commonly be represented by using a Map (also called a dictionary by some folks). This data structure is a collection of key-value pairs, where the date would be your key and a Set or List of your tuples would be your value.

为元组数据定义一个类,其名称类似于Product.添加成员变量:namequantityqualityrealmqcost.为每个元组实例化一个对象.

Define a class for your tuple data, named something like Product. Add member variables: name, quantity, quality, realmq, and cost. Instantiate an object for each tuple.

创建 Map ,例如 TreeMap .成为 可以使您的日期按时间顺序排列.

Create a Map such as a TreeMap. Being a SortedMap it keeps your dates in chronological order.

SortedMap< Product > map = new TreeMap<>() ;

使用LocalDate作为日期值,即地图中的 key .

Use LocalDate for your date values, the key in your map.

LocalDate ld = LocalDate.of( 2018 , 1 , 23 ) ;
map.put( ld , new ArrayList< Product >() ) ; // Pass an initial capacity in those parens if you know a likely size of the list.

对于每个Product对象,从地图上检索相关日期的列表,然后将产品添加到列表中.

For each Product object, retrieve the list from the map for the relevant date, add the product to the list.

序列化时,使用XML或JSON框架将映射写入存储.

When serializing, use an XML or JSON framework to write the map to storage.

或者自己做,编写自己的数据格式.从地图上获取所有键,循环它们,将每个日期写入文件.并针对每个日期从地图中提取其列表(每个键的每个值).在列表中循环Product对象.写下每个产品的成员变量.使用任何字段和行定界符.尽管由于我从未理解的原因而很少使用,但ASCII(Unicode的一个子集)具有特定的分隔符.我建议您使用这些分隔符. 代码点:

Or do so yourself, writing your own data format. Get all the keys from the map, loop them, writing each date to file. And for each date, extract its list from the map (each value for each key). Loop the Product objects in the list. Write out each product’s member variables. Use any field and row delimiters. Though not often used for reasons I have never understood, ASCII (a subset of Unicode) has specific delimiter characters. I suggest you use these separators. The code points:

  • 字段31(信息分隔符一)
  • 每行30条(信息分隔符2)
  • 组29个(信息分隔符3)
  • 28个文件(信息分隔符四)

所有这些问题在Stack Overflow上已经得到了很多解决.搜索以了解更多信息.

All of these issues have been addressed many times on Stack Overflow. Search to learn more.

序列化数据时,请勿包含多余的文本.

When serializing data, do not include extraneous text.

cost列中的$只是噪音.如果要表示一种特定的货币,那么简单的$可能无法完成工作,因为它可能是加元,美元,墨西哥比索或其他货币.因此,请使用标准货币符号,例如CAD& USD& MXN.如果所有值都使用一种已知的货币(例如CAD),请完全省略"$".

The $ in your cost column is just noise. If you meant to indicated a particular currency, a simple $ fails to do the job as it could be Canadian dollars, United States dollars, Mexican pesos, or perhaps other currencies. So use a standard currency symbol such as CAD & USD & MXN. If all the values are in a single known currency such as CAD, then omit the ‘$’ entirely.

前言:如果您经常将数据移入或移出这些文件进行更新,则应该使用数据库而不是文本文件.

Preface: If you are frequently moving data in and out of these files for updating, you should be using a database rather than text files.

无需担心CSV,XML和JSON的性能.

No need to worry about performance of CSV versus XML versus JSON.

首先,您陷入了过早优化的邪恶陷阱(google/duckduckgo这个短语).

Firstly, you are falling into the evil trap of premature optimization (google/duckduckgo that phrase).

第二,您必须拥有大量经常处理的数据,以使任何性能上的显着差异都远远超过普通商务应用程序.从存储设备(甚至从SSD驱动器)访问任何格式的文件的速度都很慢,以至于与CPU驱动的数据处理时间相形见.

Secondly, you would have to have enormous amount of data frequently processed to have any performance difference be significant, far beyond that of common business apps. Accessing files of any format from storage, even from SSD drives, is so slow that it dwarfs time taken for the CPU-driven processing of the data.

根据适合您的数据和应用程序的需求选择一种格式.

Choose a format based on fitting the needs of your data and app.

对于简单的平面数据,请使用CSV或制表符分隔或ASCII/Unicode代码进行分隔(代码点28-31).

For simple flat data, use CSV or Tab-delimited or the ASCII/Unicode codes for delimiting (codepoints 28-31).

对于分层数据,请使用XML. XML具有可以通过规范非常精确地定义的优点.已经为XML构建了许多工具. XML Schema也定义明确.这提供了一种强大的方法,可以在尝试处理之前验证传入的数据文件.

For hierarchical data, use XML. XML has the advantage of being very precisely defined by specification. So much tooling has been built for XML. And XML Schema is also well-defined. This provides a powerful way to validate incoming data files before attempting to process.

对于JSON,仅在必要时使用,并且仅用于少量相对简单的数据.它缺少XML的定义明确的规范和架构.它不适用于深层次结构或庞大的集合. JSON之所以存在,是因为它对JavaScript程序员来说很方便,并且因为IT行业对重新发明轮子的自虐倾向.

As for JSON, use only if you must, and only for small amounts of relatively simple data. It lacks the well-defined specs and schema of XML. It is not intended to work well with deep hierarchies or vast collections. JSON only exists because it is convenient for JavaScript programmers, and because of the IT industry’s masochistic penchant for reinventing the wheel over and over again.

XML和JSON具有一个主要优点:绑定.在Java世界中,既有标准框架又有方便但非标准的框架,可用于将Java对象自动序列化为XML或JSON文本.朝另一个方向发展,这些框架可以直接从传入的XML/JSON实例化Java对象.因此,您无需自己编写代码来处理每个数据字段.

XML and JSON share one major advantage: binding. In the Java world, there are both standard and handy-but-non-standard frameworks for automatically serializing your Java object’s as XML or JSON text. Going the other direction, the frameworks can instantiate Java objects directly from your incoming XML/JSON. So you needn’t write code yourself to handle each field of data.

对于问题"中显示的简单数据,此绑定功能不值得打扰.为此,使用 Apache Commons CSV 的CSV或制表符分隔是合适的,如本答案所示.

This binding feature is not worth the bother for the simple data shown in the Question. For that, CSV or Tab-delimited is appropriate, with Apache Commons CSV as shown in this Answer.

提示:您应该发送每个数据文件的哈希(MD5,SHA等).接收到文件和哈希后,接收计算机将重新计算传入文件的哈希.然后比较散列结果以验证数据文件到达时数据没有损坏.

Tip: You should send a hash (MD5, SHA, etc) of each data file. Upon receiving the file and the hash, the receiving computer recalculates the hash of the incoming file. Then compare hash results to verify that the data file arrived without corruption in its data.

这篇关于动态读取CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆