如何最好地将CSV中的数据存储在Java类中?是Row对象的单个列表,还是带有嵌套对象的单个对象? [英] How to best store data from CSV in java class? A single list of Row objects, or a single object with nested objects?

查看:120
本文介绍了如何最好地将CSV中的数据存储在Java类中?是Row对象的单个列表,还是带有嵌套对象的单个对象?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Date,Locality,District,New Cases,Hospitalizations,Deaths
5/21/2020,Accomack,Eastern Shore,709,40,11
5/21/2020,Albemarle,Thomas Jefferson,142,19,4
5/21/2020,Alleghany,Alleghany,9,4,0
5/21/2020,Amelia,Piedmont,22,7,1
5/21/2020,Amherst,Central Virginia,25,3,0
5/21/2020,Appomattox,Central Virginia,25,1,0
5/21/2020,Arlington,Arlington,1763,346,89
... // skipped down to the next day
5/20/2020,Accomack,Eastern Shore,709,39,11
5/20/2020,Albemarle,Thomas Jefferson,142,18,4
5/20/2020,Alleghany,Alleghany,10,4,0
5/20/2020,Amelia,Piedmont,21,7,1
5/20/2020,Amherst,Central Virginia,25,3,0
5/20/2020,Appomattox,Central Virginia,24,1,0
5/20/2020,Arlington,Arlington,1728,334,81
5/20/2020,Augusta,Central Shenandoah,88,4,1
... // continued

我像上面的CSV一样在美国拥有一个州的数据,并希望对其进行一些数据分析,以便可以通过rest API发送该数据.我要进行的数据分析是各种汇总,例如:各州按日期划分的总病例数,整个州的总病例数,按地区分组的总病例数,按日期划分的一个地区的总病例数,一个县的总病例数按日期等.可以使用此数据进行的所有基本分组依据.

I have data for a State in the US like the above in a CSV and would like to do some data analysis on it so that I can send it through a rest API. The data analysis that I would like to do are various aggregations, such as: total cases across the state by date, total cases for the entire state , total cases grouped by district, total cases for a district by date, total cases for a county by date, etc. Just all the basic groupby's that one could do with this data.

现在,我的问题是弄清楚如何在没有数据库的情况下将数据正确存储在java中.我有一个使用Row对象列表的成功实现,其中每个Row对象在CSV中仅包含一行.然后,使用Java的Stream api,我已经能够过滤并获取其中一些统计信息.然后,我将这些统计信息打包到单个Row对象或List<Row>中,并将其发送到API以解析为JSON.这行得通,但是我觉得这不是最好的方法.
还有其他一些更多的面向对象的方法来利用DateDistrictCountyCases列.

Now, my problem is figuring out how to properly store this data in java, without a database. I have one successful implementation using a list of Row objects, where each Row object contains just one row in the CSV. Then using java's Stream api I have been able to filter and get some of these statistics. I then package these statistics into a single Row object or a List<Row> and send it to the API to be parsed into JSON. This has worked ok, but I feel that this is not the best way.
Is there some other more object-oriented way to utilize the Date, District, County, Cases column.

我正在考虑做这样的事情:

I was thinking of doing something like this :

class State {
     List<District> districtList;
     String name;
}

class District {
     List<County> countyList;
     String name;
}

class County {
     LocalDate date;
     String name;
     int cases;
     // more stuff
}

然后,我将创建一个State对象,其中包含District个对象的列表,每个对象都包含许多County个对象的列表,每个日期一个.

Then I would create one State object with a list of District objects, each with a list of many County objects, one per date.

这看起来像是过分杀伤力了吗?还有其他一些干净的方法可以将此数据集读入数据结构,以便轻松汇总摘要信息.

Does this seem like overkill? Is there some other clean way to read this dataset into a data structure that allows for easily aggregating summary information.

我目前的操作方式现在可以使用,但我正在寻找更好的方法!

The way that I'm currently doing it now works, but I am looking for a better way!

推荐答案

从您的描述来看,您的方法听起来不错,而且是正确的面向对象的.但是,如果没有其他信息(例如,可能以其他方式指示的特定汇总),您在区域"对象中会有多个重复"的县"对象似乎很奇怪.例如:

From your description, your approach seems sound, and properly object-oriented. However, without additional information (e.g. specific aggregations which may dictate otherwise), it seems odd you would have multiple "duplicate" 'County' objects in your District objects. For example:

[{"date":"5/21/2020","name":"Accomack"},
 {"date":"5/20/2020","name":"Accomack"}]

从面向对象的角度来看,您似乎希望按日期"(每个日期包含县"行列表)进行附加的汇总.

From an object-oriented view, it seems you'd want an additional level of aggregation, by "Date" (with each date containing a list of 'County' rows).

一个考虑:如果您的聚合与数据库方法更好地匹配,我认为应该保留源数据中的每一行并查询AS/IS,并通过Stream lambda对其进行过滤和排序.

One consideration: if your aggregations align better with a database approach, I would think each row from the source data should be kept and queried AS/IS, filtered and sorted via Stream lambdas.

这篇关于如何最好地将CSV中的数据存储在Java类中?是Row对象的单个列表,还是带有嵌套对象的单个对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆