从CSV文件解析不同类型的数据格式 [英] Parsing different types of data format from a CSV file
问题描述
我仍然是Java编程的初学者,因此如果我使问题变得过于复杂,我会提前道歉.
I am still a beginner with Java programming so I apologise in advance if I am over-complicating my problem.
我的程序是什么? 我正在构建一个基于GUI的程序.该程序的目标是加载CSV,XML或JSON文件,然后程序将数据存储到Array中.然后,数据将显示在文本框中.最终,该程序将能够将数据绘制到图形上.
What is my program? I am building a GUI based program. The goal of the program is to load a CSV, XML or JSON file and for the program to then store the data into an Array. The data will then be displayed in a text box. Ultimately, the program will have the ability to plot data to a graph.
GUI详细信息:
- 3个单选按钮-允许用户选择CSV,XML或JSON
- 加载文件按钮
- 显示按钮-将数据显示到textArea
- 显示图形按钮
- 文本区域
问题::我无法将数据存储到数组中.我相信这是因为数据的格式.因此,例如,这是CSV文件的前3行:
Problem: I am having trouble storing the data into an Array. I believe this is because of the format of the data. So for example, this is the first 3 lines of the CSV file:
millis,stamp,datetime,light,temp,vcc
1000, 1273010254, 2010/5/4 21:57:34, 333, 78.32, 3.54
2000, 1273010255, 2010/5/4 21:57:35, 333, 78.32, 3.92
3000, 1273010256, 2010/5/4 21:57:36, 344, 78.32, 3.95
(注意-CSV/XML/JSON文件中有52789000行数据)
(Note - there are 52789000 lines of data in the CSV/XML/JSON files)
CSV-Reader类包含以下方法:读取数据,将其存储到数组中,然后将其存储到dataList中.
The CSV-Reader Class contains the method for reading through the data, storing it into an array and then storing it to a dataList.
从上面的示例中可以看到,某些数据类型有很大不同.我在拆分/解析时间和日期变量时遇到了特别麻烦.
As you can see from the above example, some of the data types are much different. I am having particular trouble with splitting/parsing the time and date variables.
这是我的CSV-Reader类代码当前的样子(再次,我为菜鸟代码表示歉意).
Here is what my CSV-Reader Class code looks like at the moment (Again, I apologise for noob code).
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
public class CSVReader {
//create a class that will hold arraylist which will have objects representing all lines of the file
private List<Data> dataList = new ArrayList<Data>();
private String path;
public List<Data> getDataList() {
return dataList;
}
public String getPath() {
return path;
}
public void setPath(String path) {
this.path = path;
}
//Create a method to read through the csv stored in the path
//Create the list of data and store in the dataList
public void readCSV() throws IOException{
//i will create connection with the file, in the path
BufferedReader in = new BufferedReader(new FileReader(path));
String line = null;
line = in.readLine();
while((line = in.readLine())!=null){
//I need to split and store in the temporary variable and create an object
String[] splits = line.split("\\s*(=>|,|\\s)\\s*");
long millis = Long.parseLong(splits[0].trim());
long stamp = Long.parseLong(splits[1].trim());
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyy/M/d HH:mm:ss");
System.out.println(splits[2].trim());
LocalDateTime dateTime = LocalDateTime.parse(splits[2].trim(), formatter);
LocalDate dateTime = dateTime.toLocalDate();
LocalTime time = dateTime.toLocalTime();
int light = Integer.parseInt(splits[3].trim());
double temp = Double.parseDouble(splits[4].trim());
double vcc = Double.parseDouble(splits[5].trim());
Data d = new Data(millis,stamp,datetime,light,temp,vcc);//uses constructor
//final job is to add this object 'd' onto the dataList
dataList.add(d);
}//end of while loop
}
任何帮助将不胜感激!
Any help would be greatly appreciated!
编辑1-我认为日期和时间是单独的CSV标头.它们不是.因此,时间变量已从程序中删除.它已替换为datetime变量.
Edit 1 - I thought that date and time were seperate CSV headers. They were not. Therefore the time variable has been deleted from the program. It has been replaced with the datetime variable.
编辑2-我的程序现在正在读取CSV文件,直到csv的第15行
Edit 2 - My program is now reading the CSV file up until line 15 of the csv
27000,1273010280, 2010/5/4 21:58:0 ,288,77.74,3.88
27000, 1273010280, 2010/5/4 21:58:0, 288, 77.74, 3.88
控制台错误
Exception in thread "AWT-EventQueue-0"
java.time.format.DateTimeParseException: Text **'2010/5/4 21:58:0'** could not
be parsed at index 15
at java.time.format.DateTimeFormatter.parseResolved0(Unknown Source)
at java.time.format.DateTimeFormatter.parse(Unknown Source)
at java.time.LocalDateTime.parse(Unknown Source)
at CSVReader.readCSV(CSVReader.java:55)
at GUI$2.actionPerformed(GUI.java:85)
at javax.swing.AbstractButton.fireActionPerformed(Unknown Source)
at javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source)
at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source)
at javax.swing.DefaultButtonModel.setPressed(Unknown Source)
at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(Unknown Source)
at java.awt.Component.processMouseEvent(Unknown Source)
at javax.swing.JComponent.processMouseEvent(Unknown Source)
at java.awt.Component.processEvent(Unknown Source)
at java.awt.Container.processEvent(Unknown Source)
at java.awt.Component.dispatchEventImpl(Unknown Source)
at java.awt.Container.dispatchEventImpl(Unknown Source)
at java.awt.Component.dispatchEvent(Unknown Source)
at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source)
at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source)
at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source)
at java.awt.Container.dispatchEventImpl(Unknown Source)
at java.awt.Window.dispatchEventImpl(Unknown Source)
at java.awt.Component.dispatchEvent(Unknown Source)
at java.awt.EventQueue.dispatchEventImpl(Unknown Source)
at java.awt.EventQueue.access$500(Unknown Source)
at java.awt.EventQueue$3.run(Unknown Source)
at java.awt.EventQueue$3.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source)
at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source)
at java.awt.EventQueue$4.run(Unknown Source)
at java.awt.EventQueue$4.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source)
at java.awt.EventQueue.dispatchEvent(Unknown Source)
at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source)
at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.run(Unknown Source)
推荐答案
ISO 8601
已解决,由于我的CSV格式不符合正确的日期和时间格式,因此程序崩溃了(请阅读下面的评论).
SOLVED So the program was crashing due to my CSV not following the correct date and time format (Read comments below).
当将日期时间值作为文本交换时,请使用标准的 ISO 8601 格式,而不要使用发明自己的.明智地设计它们,使其易于通过机器解析,并且易于跨文化的人类阅读.因此,2010-05-04T21:57:34
,而不是2010/5/4 21:57:34
.
When exchanging date-time values as text, use the standard ISO 8601 formats rather than inventing your own. They are wisely designed to be easy to parse by machine and easy to read by humans across cultures. So, 2010-05-04T21:57:34
, not 2010/5/4 21:57:34
.
java.time 类在解析/生成字符串时默认使用ISO 8601格式.
The java.time classes use the ISO 8601 formats by default when parsing/generating strings.
数据Feed的第二列和第三列表示相同的内容:带日期的日期.第一个是从纪元参考日期1970-01-01T00:00Z开始的整秒计数(Z
表示UTC).
The 2nd and 3rd columns of your data feed represent the same thing: a date with time-of-day. The first is a count of whole seconds since the epoch reference date of 1970-01-01T00:00Z (Z
means UTC).
因此同时包含这两者是很愚蠢的.如上所述,第3列的格式选择不当.在我看来,使用从纪元开始计数"的第二列方法也不是一个好的选择,因为它不明显,没有人可以理解其含义,因此使错误变得不明显,从而使调试和记录变得困难.
So it is silly to include both. As mentioned above, the 3rd column is in a poorly chosen format. The 2nd column approach of using a count-from-epoch is also a poor choice in my opinion, as it is not obvious, no human can decipher its meaning, and so it makes mistakes non-obvious thereby making debugging and logging difficult.
要处理我们已有的内容,可以将距秒的秒数解析为Instant
.这堂课代表了UTC的一刻.
To deal with what we have, the seconds-from-epoch can be parsed as an Instant
. This class represents a moment in UTC.
Instant instant = Instant.ofEpochMilli( 1_273_010_254L ) ;
您的第3列给出了日期和时间,但省略了时区或UTC偏移量的指示符.由于从1970年1月1日起以秒为单位解析时,它与第二列匹配,因此我们知道它的值适用于UTC.忽略此类信息是不明智的做法,例如拥有一个没有货币指标的货币金额.
Your 3rd column gives a date and time but omits an indicator of time zone or offset-from-UTC. Since it matches the 2nd column when parsed as seconds from first moment of 1970 in UTC, we know its value was intended for UTC. Omitting such info is bad practice, like having a monetary amount with no indicator of currency.
理想情况下,两列均应替换为ISO 8601格式的字符串,例如2010-05-04T21:57:34Z
包括Z
表示UTC.
Ideally both columns should be replaced by a string in ISO 8601 format, for example 2010-05-04T21:57:34Z
including the Z
to indicate UTC.
如果我们不得不在不知道要用于UTC的情况下解析第三列,则将其解析为LocalDateTime
,即具有一天中时间但缺少时区或偏移量的日期.我们需要定义一种格式设置模式以匹配您的输入.
If we had to parse the 3rd column without knowing it was intended for UTC, we would parse as a LocalDateTime
, a date with time-of-day but lacking a time zone or offset. We need to define a formatting pattern to match your input.
DateTimeFormatter f = DateTimeFormatter.ofPattern( "uuuu/M/d HH:mm:ss" );
LocalDateTime localDateTime = LocalDateTime.parse( "2010/5/4 21:57:34" , f );
BigDecimal
为了精确起见,您的十进制小数应该表示为BigDecimal
对象.切勿在意准确性的地方使用double
/Double
或float
/Float
.这些类型使用浮点技术,该技术
BigDecimal
Your decimal fraction numbers should be represented as BigDecimal
objects for accuracy. Never use double
/Double
or float
/Float
where you care about accuracy. These types use floating-point technology which trades away accuracy for speed of execution. In contrast, BigDecimal
is slow but accurate.
从字符串中解析BigDecimal
.
new BigDecimal ( "78.32" )
Apache Commons CSV
当经过良好测试的代码已经存在时,请勿编写代码.已经编写了读取 CSV /
我将 Apache Commons CSV 用于此类工作.这些格式有多种变体,均由该库处理.
I use Apache Commons CSV for such work. There are several variations of these formats, all handled by this library.
这是示例代码.首先定义一个类来保存您的数据,这里命名为Reading
.
Here is example code. First define a class to hold your data, here named Reading
.
package com.basilbourque.example;
import java.math.BigDecimal;
import java.time.Instant;
import java.time.LocalDateTime;
public class Reading {
private Integer millis;
private Instant instant;
private LocalDateTime localDateTime;
private Integer light;
private BigDecimal temp;
private BigDecimal vcc;
public Reading ( Integer millis , Instant instant , LocalDateTime localDateTime , Integer light , BigDecimal temp , BigDecimal vcc ) {
// TODO: Add checks for null arguments: Objects.requireNonNull( … ).
this.millis = millis;
this.instant = instant;
this.localDateTime = localDateTime;
this.light = light;
this.temp = temp;
this.vcc = vcc;
}
@Override
public String toString ( ) {
return "com.basilbourque.example.Reading{" +
"millis=" + millis +
", instant=" + instant +
", localDateTime=" + localDateTime +
", light=" + light +
", temp=" + temp +
", vcc=" + vcc +
'}';
}
}
示例数据文件:
millis,stamp,datetime,light,temp,vcc
1000, 1273010254, 2010/5/4 21:57:34, 333, 78.32, 3.54
2000, 1273010255, 2010/5/4 21:57:35, 333, 78.32, 3.92
3000, 1273010256, 2010/5/4 21:57:36, 344, 78.32, 3.95
现在调用Commons CSV解析该数据,实例化Reading
对象,然后收集它们.
And now call upon Commons CSV to parse that data, instantiate Reading
objects, and collect them.
DateTimeFormatter f = DateTimeFormatter.ofPattern( "uuuu/M/d HH:mm:ss" );
List < Reading > readings = new ArrayList <>( 3 );
Reader reader = null;
try {
reader = new FileReader( "/Users/basilbourque/data.csv" );
Iterable < CSVRecord > records = CSVFormat.RFC4180.withIgnoreSurroundingSpaces( true ).withHeader().parse( reader );
for ( CSVRecord record : records ) {
// Grab inputs
String millisInput = record.get( "millis" );
String stampInput = record.get( "stamp" );
String datetimeInput = record.get( "datetime" );
String lightInput = record.get( "light" );
String tempInput = record.get( "temp" );
String vccInput = record.get( "vcc" );
// Parse inputs
Integer millis = Integer.valueOf( millisInput );
Instant instant = Instant.ofEpochSecond( Integer.valueOf( stampInput ) );
LocalDateTime localDateTime = LocalDateTime.parse( datetimeInput , f );
Integer light = Integer.valueOf( lightInput );
BigDecimal temp = new BigDecimal( tempInput );
BigDecimal vcc = new BigDecimal( vccInput );
// Construct object
Reading r = new Reading( millis , instant , localDateTime , light , temp , vcc );
// Collect object
readings.add( r );
}
} catch ( FileNotFoundException e ) {
e.printStackTrace();
} catch ( IOException e ) {
e.printStackTrace();
}
System.out.println( readings );
[com.basilbourque.example.Reading {millis = 1000,Instant = 2010-05-04T21:57:34Z,localDateTime = 2010-05-04T21:57:34,light = 333,temp = 78.32,vcc = 3.54},com.basilbourque.example.Reading {millis = 2000,Instant = 2010-05-04T21:57:35Z,localDateTime = 2010-05-04T21:57:35,light = 333,temp = 78.32,vcc = 3.92 },com.basilbourque.example.Reading {millis = 3000,Instant = 2010-05-04T21:57:36Z,localDateTime = 2010-05-04T21:57:36,light = 344,temp = 78.32,vcc = 3.95} ]
[com.basilbourque.example.Reading{millis=1000, instant=2010-05-04T21:57:34Z, localDateTime=2010-05-04T21:57:34, light=333, temp=78.32, vcc=3.54}, com.basilbourque.example.Reading{millis=2000, instant=2010-05-04T21:57:35Z, localDateTime=2010-05-04T21:57:35, light=333, temp=78.32, vcc=3.92}, com.basilbourque.example.Reading{millis=3000, instant=2010-05-04T21:57:36Z, localDateTime=2010-05-04T21:57:36, light=344, temp=78.32, vcc=3.95}]
关于您的提及:
将数据存储到数组
store the data into an Array
您正在使用 ArrayList
在您的代码中,而不是数组中.请参阅Oracle教程有关列表和 Java集合框架.在大小和速度真正重要的地方,我们可以选择一个数组.
You are using an ArrayList
in your code, not an array. See the Oracle Tutorials for lists and for arrays to understand the difference. Generally best to use the Java Collections framework. Where size and speed really matter, we may choose an array.
java.time 框架内置于Java 8及更高版本中.这些类取代了麻烦的旧版日期时间类,例如 Calendar
,& SimpleDateFormat
.
The java.time framework is built into Java 8 and later. These classes supplant the troublesome old legacy date-time classes such as java.util.Date
, Calendar
, & SimpleDateFormat
.
Joda-Time 项目,现在位于<一个href ="https://en.wikipedia.org/wiki/Maintenance_mode" rel ="nofollow noreferrer">维护模式,建议迁移到要了解更多信息,请参见 Oracle教程 .并在Stack Overflow中搜索许多示例和说明.规范为 JSR 310 .
To learn more, see the Oracle Tutorial. And search Stack Overflow for many examples and explanations. Specification is JSR 310.
您可以直接与数据库交换 java.time 对象.使用符合java.sql.*
类.
You may exchange java.time objects directly with your database. Use a JDBC driver compliant with JDBC 4.2 or later. No need for strings, no need for java.sql.*
classes.
在哪里获取java.time类?
Where to obtain the java.time classes?
- Java SE 8 , Java SE 9 , Java SE 11 ,以及更高版本-具有捆绑实施的标准Java API的一部分.
- Java 9添加了一些次要功能和修复.
- Java SE 8, Java SE 9, Java SE 10, Java SE 11, and later - Part of the standard Java API with a bundled implementation.
- Java 9 adds some minor features and fixes.
- 大多数 java.time 功能都向后移植到Java 6& 7在> ThreeTen-Backport 中. /li>
- Most of the java.time functionality is back-ported to Java 6 & 7 in ThreeTen-Backport.
- java.time 类的Android捆绑包实现的最新版本.
- 对于较早的Android(< 26), ThreeTenABP 项目适应> ThreeTen-Backport (如上所述).请参阅 如何使用ThreeTenABP… .
- Later versions of Android bundle implementations of the java.time classes.
- For earlier Android (<26), the ThreeTenABP project adapts ThreeTen-Backport (mentioned above). See How to use ThreeTenABP….
> ThreeTen-Extra 项目扩展了java.time与其他班级.该项目为将来可能在java.time中添加内容提供了一个试验场.您可能会在这里找到一些有用的类,例如
Interval
,YearWeek
,YearQuarter
和更多.这篇关于从CSV文件解析不同类型的数据格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!