如何合并csv文件的连续行 [英] How to merge continuous lines of a csv file

查看:114
本文介绍了如何合并csv文件的连续行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个csv文件,该文件在视频帧上传输某些进程的输出.在文件中,每行是firenone.每行都有startTimeendTime.现在,我需要在连续火灾中以它们的开始结束时间来群集和打印一个实例.关键是,如果中间none的时间在 1 秒之内,那么也可以容忍.因此,很明显,重点是将更近的帧的检测聚在一起……以某种方式使结果平滑.而不是多个31-32, 32-33, ...,而只需要一行31-35秒的行即可.

I have a csv file that carries outputs of some processes over video frames. In the file, each line is either fire or none. Each line has startTime and endTime. Now I need to cluster and print only one instance out of continuous fires with their start and end time. The point is that a few none in the middle can also be tolerated if their time is within 1 second. So to be clear, the whole point is to cluster detections of closer frames together...somehow smooth out the results. Instead of multiple 31-32, 32-33, ..., have a single line with 31-35 seconds.

该怎么做?

例如,由于none间隔在1s之内,因此以下所有连续项被视为单个项.因此,我们将得到类似于1,file1,name1,30.6,32.2,fire,0.83的分数,该分数是所有防火线的平均值.

For instance, the whole following continuous items are considered a single one since the none gaps is within 1s. So we would have something like 1,file1,name1,30.6,32.2,fire,0.83 with that score being the mean of all fire lines.

frame_num,uniqueId,title,startTime,endTime,startTime_fmt,object,score
...
10,file1,name1,30.6,30.64,0:00:30,fire,0.914617
11,file1,name1,30.72,30.76,0:00:30,none,0.68788
12,file1,name1,30.84,30.88,0:00:30,fire,0.993345
13,file1,name1,30.96,31,0:00:30,fire,0.991015
14,file1,name1,31.08,31.12,0:00:31,fire,0.983197
15,file1,name1,31.2,31.24,0:00:31,fire,0.979572
16,file1,name1,31.32,31.36,0:00:31,fire,0.985898
17,file1,name1,31.44,31.48,0:00:31,none,0.961606
18,file1,name1,31.56,31.6,0:00:31,none,0.685139
19,file1,name1,31.68,31.72,0:00:31,none,0.458374
20,file1,name1,31.8,31.84,0:00:31,none,0.413711
21,file1,name1,31.92,31.96,0:00:31,none,0.496828
22,file1,name1,32.04,32.08,0:00:32,fire,0.412836
23,file1,name1,32.16,32.2,0:00:32,fire,0.383344

这是我到目前为止的尝试:

This is my attempts so far:

with open(filename) as fin:
    lastWasFire=False
    for line in fin:
        if "fire" in line:
             if lastWasFire==False and line !="" and line.split(",")[5] != lastline.split(",")[5]:
                  fout.write(line)
             else:
                lastWasFire=False
             lastline=line

推荐答案

我假设您不想使用外部库进行数据处理,例如numpypandas.以下代码应与您的尝试非常相似:

I assume you don't want to use external libraries for data processing like numpy or pandas. The following code should be quite similar to your attempt:

threshold = 1.0

# We will chain a "none" object at the end which triggers the threshold to make sure no "fire" objects are left unprinted
from itertools import chain
trigger = (",,,0,{},,none,".format(threshold + 1),)

# Keys for columns of input data
keys = (
    "frame_num",
    "uniqueId",
    "title",
    "startTime",
    "endTime",
    "startTime_fmt",
    "object",
    "score",
)

# Store last "fire" or "none" objects
last = {
    "fire": [],
    "none": [],
}

with open(filename) as f:
    # Skip first line of input file
    next(f)
    for line in chain(f, trigger):
        line = dict(zip(keys, line.split(",")))
        last[line["object"]].append(line)
        # Check threshold for "none" objects if there are previous unprinted "fire" objects
        if line["object"] == "none" and last["fire"]:
            if float(last["none"][-1]["endTime"]) - float(last["none"][0]["startTime"]) > threshold:
                print("{},{},{},{},{},{},{},{}".format(
                    last["fire"][0]["frame_num"],
                    last["fire"][0]["uniqueId"],
                    last["fire"][0]["title"],
                    last["fire"][0]["startTime"],
                    last["fire"][-1]["endTime"],
                    last["fire"][0]["startTime_fmt"],
                    last["fire"][0]["object"],
                    sum([float(x["score"]) for x in last["fire"]]) / len(last["fire"]),
                ))
                last["fire"] = []
        # Previous "none" objects don't matter anymore as soon as a "fire" object is being encountered
        if line["object"] == "fire":
            last["none"] = []

正在逐行处理输入文件,并且在last["fire"]中累积了"fire"对象.它们将被合并并打印

The input file is being processed line by line and "fire" objects are being accumulated in last["fire"]. They will be merged and printed if either

  • last["none"]中的"none"个对象达到了threshold

,或者由于手动链接的trigger对象(长度为threshold + 1"none"对象)而到达输入文件的末尾,因此触发阈值并随后进行合并和打印./p>

or when the end of the input file is reached due to the manually chained trigger object, which is a "none" object of length threshold + 1, therefore triggering the threshold and subsequent merge and print.

您当然可以将print替换为写入输出文件的调用.

You could replace print with a call to write into an output file, of course.

这篇关于如何合并csv文件的连续行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆