如何合并csv文件的连续行 [英] How to merge continuous lines of a csv file

查看：114 发布时间：2020/6/14 19:28:44 python python-3.x file csv file-processing

本文介绍了如何合并csv文件的连续行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个csv文件，该文件在视频帧上传输某些进程的输出.在文件中，每行是fire或none.每行都有startTime和endTime.现在，我需要在连续火灾中以它们的开始和结束时间来群集和打印一个实例.关键是，如果中间none的时间在 1 秒之内，那么也可以容忍.因此，很明显，重点是将更近的帧的检测聚在一起……以某种方式使结果平滑.而不是多个31-32, 32-33, ...，而只需要一行31-35秒的行即可.

I have a csv file that carries outputs of some processes over video frames. In the file, each line is either fire or none. Each line has startTime and endTime. Now I need to cluster and print only one instance out of continuous fires with their start and end time. The point is that a few none in the middle can also be tolerated if their time is within 1 second. So to be clear, the whole point is to cluster detections of closer frames together...somehow smooth out the results. Instead of multiple 31-32, 32-33, ..., have a single line with 31-35 seconds.

该怎么做?

例如，由于none间隔在1s之内，因此以下所有连续项被视为单个项.因此，我们将得到类似于1,file1,name1,30.6,32.2,fire,0.83的分数，该分数是所有防火线的平均值.

For instance, the whole following continuous items are considered a single one since the none gaps is within 1s. So we would have something like 1,file1,name1,30.6,32.2,fire,0.83 with that score being the mean of all fire lines.

frame_num,uniqueId,title,startTime,endTime,startTime_fmt,object,score
...
10,file1,name1,30.6,30.64,0:00:30,fire,0.914617
11,file1,name1,30.72,30.76,0:00:30,none,0.68788
12,file1,name1,30.84,30.88,0:00:30,fire,0.993345
13,file1,name1,30.96,31,0:00:30,fire,0.991015
14,file1,name1,31.08,31.12,0:00:31,fire,0.983197
15,file1,name1,31.2,31.24,0:00:31,fire,0.979572
16,file1,name1,31.32,31.36,0:00:31,fire,0.985898
17,file1,name1,31.44,31.48,0:00:31,none,0.961606
18,file1,name1,31.56,31.6,0:00:31,none,0.685139
19,file1,name1,31.68,31.72,0:00:31,none,0.458374
20,file1,name1,31.8,31.84,0:00:31,none,0.413711
21,file1,name1,31.92,31.96,0:00:31,none,0.496828
22,file1,name1,32.04,32.08,0:00:32,fire,0.412836
23,file1,name1,32.16,32.2,0:00:32,fire,0.383344

这是我到目前为止的尝试:

This is my attempts so far:

with open(filename) as fin:
    lastWasFire=False
    for line in fin:
        if "fire" in line:
             if lastWasFire==False and line !="" and line.split(",")[5] != lastline.split(",")[5]:
                  fout.write(line)
             else:
                lastWasFire=False
             lastline=line

推荐答案

我假设您不想使用外部库进行数据处理，例如numpy或pandas.以下代码应与您的尝试非常相似:

I assume you don't want to use external libraries for data processing like numpy or pandas. The following code should be quite similar to your attempt:

threshold = 1.0

# We will chain a "none" object at the end which triggers the threshold to make sure no "fire" objects are left unprinted
from itertools import chain
trigger = (",,,0,{},,none,".format(threshold + 1),)

# Keys for columns of input data
keys = (
    "frame_num",
    "uniqueId",
    "title",
    "startTime",
    "endTime",
    "startTime_fmt",
    "object",
    "score",
)

# Store last "fire" or "none" objects
last = {
    "fire": [],
    "none": [],
}

with open(filename) as f:
    # Skip first line of input file
    next(f)
    for line in chain(f, trigger):
        line = dict(zip(keys, line.split(",")))
        last[line["object"]].append(line)
        # Check threshold for "none" objects if there are previous unprinted "fire" objects
        if line["object"] == "none" and last["fire"]:
            if float(last["none"][-1]["endTime"]) - float(last["none"][0]["startTime"]) > threshold:
                print("{},{},{},{},{},{},{},{}".format(
                    last["fire"][0]["frame_num"],
                    last["fire"][0]["uniqueId"],
                    last["fire"][0]["title"],
                    last["fire"][0]["startTime"],
                    last["fire"][-1]["endTime"],
                    last["fire"][0]["startTime_fmt"],
                    last["fire"][0]["object"],
                    sum([float(x["score"]) for x in last["fire"]]) / len(last["fire"]),
                ))
                last["fire"] = []
        # Previous "none" objects don't matter anymore as soon as a "fire" object is being encountered
        if line["object"] == "fire":
            last["none"] = []

正在逐行处理输入文件，并且在last["fire"]中累积了"fire"对象.它们将被合并并打印

The input file is being processed line by line and "fire" objects are being accumulated in last["fire"]. They will be merged and printed if either

last["none"]中的"none"个对象达到了threshold

，或者由于手动链接的trigger对象(长度为threshold + 1的"none"对象)而到达输入文件的末尾，因此触发阈值并随后进行合并和打印./p>

or when the end of the input file is reached due to the manually chained trigger object, which is a "none" object of length threshold + 1, therefore triggering the threshold and subsequent merge and print.

您当然可以将print替换为写入输出文件的调用.

You could replace print with a call to write into an output file, of course.

这篇关于如何合并csv文件的连续行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何合并csv文件的连续行 [英] How to merge continuous lines of a csv file

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何合并csv文件的连续行 [英] How to merge continuous lines of a csv file

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭