Python根据时间戳对文件内容进行排序并将其写入新文件吗? [英] Python Sorting the contents of the file according to the timestamp and write it to new file?

查看:492
本文介绍了Python根据时间戳对文件内容进行排序并将其写入新文件吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文件,该文件以以下格式存储数据

I have the file which stores the data in the below format

TIME[04.26_12:30:30:853664]ID[ROLL:201987623]MARKS[PHY:100|MATH:200|CHEM:400]
TIME[03.27_12:29:30.553669]ID[ROLL:201987623]MARKS[PHY:100|MATH:1200|CHEM:900]
TIME[03.26_12:28:30.753664]ID[ROLL:2341987623]MARKS[PHY:100|MATH:200|CHEM:400]
TIME[03.26_12:29:30.853664]ID[ROLL:201978623]MARKS[PHY:0|MATH:0|CHEM:40]
TIME[04.27_12:29:30.553664]ID[ROLL:2034287623]MARKS[PHY:100|MATH:200|CHEM:400]

这种类型的数据存储在文本文件中,我使用此文本文件创建的内容是,我制作了多个名称为ROLL的文件,并将该特定卷号的数据存储在文本文件中,为此,我在python中使用正则表达式这实际上是代码文件太大,我可以使用readlines函数将它们存储在列表中,这会导致内存错误,因此我必须逐行阅读,这是我为此编写的代码

This type of data is stored in the text file, what I am creating with this text file is that I am making several files with names as ROLL and storing the data of that particular roll number in the text file, For which I am using regex in python this is the code actually file is so large that I can store them in the list using readlines function it'll give memory error so I have to read it line by line here is the code that i have written for it

     import re 
     import os
     import fileinput
     from datetime import datatime
     from collections import defaultdict

     time_for_roll_numbers=defaultdict()# a dictionary I am using the timestamp roll number wise

     with open('Marksinfo.txt','r') as f:
             for line in f:
                ind=re.match(r'(.*)TIME\[' + r'(.*?)](.*)\[ROLL:(.*?)\]',line,re.M|re.I)
                timer_for_roll_numbers.setdefault(int(ind.group(4)),defaultdict(list))['TIME'].append(ind.group(2))
                p=open('ROLL_{}.txt'.format(ind.group(4)),"a")
                p.write(%s % line)
                p.close()

上面的函数也根据我的意愿创建文件,但是我希望数据根据我不知道该怎么办的数据中给出的时间戳值以排序格式显示,因为这是从上面的文件并写入新创建的文件,而无需考虑数据是否根据时间戳进行排序

The above function is creating the files according to my wish also , but I want the data to be in sorted format according to timestamp values given in the data that I have no idea how to do because this is fetching the lines sequentially from the above file and writing in the newly made file without considering that the data is sorted or not according to timestamp what I am getting now is this

我现在得到的实际输出格式如下

Actual Output format currently I am getting is as below

In file name ROLL_201987623.txt
 TIME[04.26_12:30:30:853664]ID[ROLL:201987623]MARKS[PHY:100|MATH:200|CHEM:400]
 TIME[03.27_12:29:30.553669]ID[ROLL:201987623]MARKS[PHY:100|MATH:1200|CHEM:900]

所需的输出格式应如下

TIME[03.27_12:29:30.553669]ID[ROLL:201987623]MARKS[PHY:100|MATH:1200|CHEM:900]
 TIME[04.26_12:30:30:853664]ID[ROLL:201987623]MARKS[PHY:100|MATH:200|CHEM:400]

明智的做法是,在每个文件中,每个卷号应采用排序格式,请提出一些建议,

Like wise for every roll number it should be in sorted format in respective files ,please suggest any ideas how to do it

在我的代码中,我还获取了此时间戳,并使用python中的日期时间库将其转换为以下格式,假设对于特定的纸卷编号,我想获取我正在使用的时间戳的每个细节(例如样本纸卷编号)是201987623

In my code I have fetched this time stamp also and converted it into the following format using the date time library in python suppose for particular roll number I want to fetch every detail of the timestamp this I am using (say sample roll number is 201987623

time_for_particular_roll=timer_for_roll_numbers[201987623]['TIME']
dt = [datetime.strptime(s, '%m.%d_%H:%M:%S.%f') for s in time_for_particular_roll]

dt包含以下格式,我可以轻松访问

dt is containing in the below format which I can access easily

(4,26,12,30,30,853664)

现在,我不知道如何在新创建的文件中为该卷号以特定的格式插入特定卷号的信息

Now I am not getting how to insert in sorted format the information of particular roll number in the newly made file for that roll number

推荐答案

我将使用排序 itertools.groupby .

用于按ROLL对行进行一次分组(按ROLL和时间戳排序).这是我将首先使用的脚本:

For grouping lines by ROLL once sorted by ROLL and timestamp. Here is the script I would use as a first approach:

import re
from itertools import groupby

regex = re.compile(r"^.*TIME\[([^]]+)\]ID\[ROLL:([^]]+)\].+$")

我将定义三个可调用项以对行进行过滤,排序和分组:

I would define three callables for filtering, sorting and grouping lines:

def func1(arg) -> bool:
    return regex.match(arg)


def func2(arg) -> str:
    match = regex.match(arg)
    if match:
        return match.group(1)
    return ""


def func3(arg) -> int:
    match = regex.match(arg)
    if match:
        return int(match.group(2))
    return 0

然后循环遍历您的输入文件.

Then loop over your input file.

首先拒绝不合规的数据. 按ROLL然后按时间戳对剩余数据进行排序. 然后按ROLL对数据进行分组.

Reject at first non-compliant data. Sort remaining data by ROLL then by timestamp. Then group data by ROLL.

with open(your_input_file) as fr:
    collection = filter(func1, fr)
    collection = sorted(collection, key=func2)
    collection = sorted(collection, key=func3)
    for key, group in groupby(collection, key=func3):
        with open(f"ROLL_{key}", mode="w") as fw:
            fw.writelines(group)

根据您的示例,该代码段将生成四个文件,这些文件的数据按时间戳的升序排序.

According to your example that snippet will produce four files with data sorted by ascending timestamp.

请勿通过将例如天数设置在第一位置来更改课程的时间戳格式.

Don't change the timestamp format of course by setting, for example, days in the first position.

这篇关于Python根据时间戳对文件内容进行排序并将其写入新文件吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆