选择正确的数据结构来解析文件 [英] choosing right data structure to parse a file

查看:156
本文介绍了选择正确的数据结构来解析文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个csv文件,内容格式如下:

I have a csv file with contents in the following format:

CSE110, Mon, 1:00 PM, Fri, 1:00 PM
CSE114, Mon, 8:00 AM, Wed, 8:00 AM, Fri, 8:00 AM

$ b

这是最好的数据结构来解析和存储这些数据?

what's the best data structure to parse and store this data?

我尝试使用命名元组,如下所示:

I tried using named tuples as follows:

CourseTimes = namedtuple('CourseTimes', 'course_name, day, start_time ')

但是单个课程可以在多个日期和时间如上面的cse114所示。这只能在运行时决定。如何处理此问题?

But a single course can be scheduled on multiple days and time as shown for cse114 above. This can only be decided at run-time. How to handle this?

我可以使用字典或列表吗?

or else, Can I make use of Dictionary or List?

以解决调度问题以将TA指派给课程。我可能需要比较时间以检查未来的任何冲突。

I am trying to solve a scheduling problem to assign TAs to courses. I might have to compare times to check for any collisions in the future

同样复杂的事情,输入文件还有其他数据,我需要解析。基本上格式如下。

Also to complicate things up, the input file has other data as well which I need to parse. Basically the following is the format.

//Course times
CSE110, Mon, 1:00 PM, Fri, 1:00 PM
CSE114, Mon, 8:00 AM, Wed, 8:00 AM, Fri, 8:00 AM
....

//Course recitation times
CSE306, Mon, 2:30 PM
CSE307, Fri, 4:00 PM
...

//class strength
CSE101, 44, yes
CSE101, 115, yes
...



将所有这些存储在我想象的单独的数据结构中。

I need store all this in separate data structures I suppose. What could be the right reg-ex patterns for each of the category?

推荐答案

首先注意一些关于数据的事情:

Start with noting a few things about your data:


  1. 您有一些独特的字符串(课程)

  2. 每门课程结束后,

这样,你有一系列唯一的键,每个键都有一个

With that, you have a series of unique keys that each have a number of values.

听起来像一个字典

要将数据导入字典,请从阅读文件。接下来,您可以使用正则表达式选择每个 [day],[hour]:[分] [AM / PM] 部分或原来 string.split()用逗号将行分成几部分。课程字符串是字典中的键,其余的行作为元组或值列表。移动到下一行。

To get that data into a dictionary, start with reading the file. Next, you can either use regular expressions to select each [day], [hour]:[minutes] [AM/PM] section or plain old string.split() to break the line into sections by the commas. The course string is the key into the dictionary with the rest of the line as a tuple or list of values. Move onto the next line.

这篇关于选择正确的数据结构来解析文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆