如何合并csv文件的连续行 [英] How to merge continuous lines of a csv file
问题描述
我有一个csv文件,该文件在视频帧上传输某些进程的输出.在文件中,每行是fire
或none
.每行都有startTime
和endTime
.现在,我需要在连续火灾中以它们的开始和结束时间来群集和打印一个实例.关键是,如果中间none
的时间在 1 秒之内,那么也可以容忍.因此,很明显,重点是将更近的帧的检测聚在一起……以某种方式使结果平滑.而不是多个31-32, 32-33, ...
,而只需要一行31-35
秒的行即可.
I have a csv file that carries outputs of some processes over video frames. In the file, each line is either fire
or none
. Each line has startTime
and endTime
. Now I need to cluster and print only one instance out of continuous fires with their start and end time. The point is that a few none
in the middle can also be tolerated if their time is within 1 second. So to be clear, the whole point is to cluster detections of closer frames together...somehow smooth out the results. Instead of multiple 31-32, 32-33, ...
, have a single line with 31-35
seconds.
该怎么做?
例如,由于none
间隔在1s之内,因此以下所有连续项被视为单个项.因此,我们将得到类似于1,file1,name1,30.6,32.2,fire,0.83
的分数,该分数是所有防火线的平均值.
For instance, the whole following continuous items are considered a single one since the none
gaps is within 1s. So we would have something like 1,file1,name1,30.6,32.2,fire,0.83
with that score being the mean of all fire lines.
frame_num,uniqueId,title,startTime,endTime,startTime_fmt,object,score
...
10,file1,name1,30.6,30.64,0:00:30,fire,0.914617
11,file1,name1,30.72,30.76,0:00:30,none,0.68788
12,file1,name1,30.84,30.88,0:00:30,fire,0.993345
13,file1,name1,30.96,31,0:00:30,fire,0.991015
14,file1,name1,31.08,31.12,0:00:31,fire,0.983197
15,file1,name1,31.2,31.24,0:00:31,fire,0.979572
16,file1,name1,31.32,31.36,0:00:31,fire,0.985898
17,file1,name1,31.44,31.48,0:00:31,none,0.961606
18,file1,name1,31.56,31.6,0:00:31,none,0.685139
19,file1,name1,31.68,31.72,0:00:31,none,0.458374
20,file1,name1,31.8,31.84,0:00:31,none,0.413711
21,file1,name1,31.92,31.96,0:00:31,none,0.496828
22,file1,name1,32.04,32.08,0:00:32,fire,0.412836
23,file1,name1,32.16,32.2,0:00:32,fire,0.383344
这是我到目前为止的尝试:
This is my attempts so far:
with open(filename) as fin:
lastWasFire=False
for line in fin:
if "fire" in line:
if lastWasFire==False and line !="" and line.split(",")[5] != lastline.split(",")[5]:
fout.write(line)
else:
lastWasFire=False
lastline=line
推荐答案
我假设您不想使用外部库进行数据处理,例如numpy
或pandas
.以下代码应与您的尝试非常相似:
I assume you don't want to use external libraries for data processing like numpy
or pandas
. The following code should be quite similar to your attempt:
threshold = 1.0
# We will chain a "none" object at the end which triggers the threshold to make sure no "fire" objects are left unprinted
from itertools import chain
trigger = (",,,0,{},,none,".format(threshold + 1),)
# Keys for columns of input data
keys = (
"frame_num",
"uniqueId",
"title",
"startTime",
"endTime",
"startTime_fmt",
"object",
"score",
)
# Store last "fire" or "none" objects
last = {
"fire": [],
"none": [],
}
with open(filename) as f:
# Skip first line of input file
next(f)
for line in chain(f, trigger):
line = dict(zip(keys, line.split(",")))
last[line["object"]].append(line)
# Check threshold for "none" objects if there are previous unprinted "fire" objects
if line["object"] == "none" and last["fire"]:
if float(last["none"][-1]["endTime"]) - float(last["none"][0]["startTime"]) > threshold:
print("{},{},{},{},{},{},{},{}".format(
last["fire"][0]["frame_num"],
last["fire"][0]["uniqueId"],
last["fire"][0]["title"],
last["fire"][0]["startTime"],
last["fire"][-1]["endTime"],
last["fire"][0]["startTime_fmt"],
last["fire"][0]["object"],
sum([float(x["score"]) for x in last["fire"]]) / len(last["fire"]),
))
last["fire"] = []
# Previous "none" objects don't matter anymore as soon as a "fire" object is being encountered
if line["object"] == "fire":
last["none"] = []
正在逐行处理输入文件,并且在last["fire"]
中累积了"fire"
对象.它们将被合并并打印
The input file is being processed line by line and "fire"
objects are being accumulated in last["fire"]
. They will be merged and printed if either
-
last["none"]
中的"none"
个对象达到了threshold
,或者由于手动链接的trigger
对象(长度为threshold + 1
的"none"
对象)而到达输入文件的末尾,因此触发阈值并随后进行合并和打印./p>
or when the end of the input file is reached due to the manually chained trigger
object, which is a "none"
object of length threshold + 1
, therefore triggering the threshold and subsequent merge and print.
您当然可以将print
替换为写入输出文件的调用.
You could replace print
with a call to write into an output file, of course.
这篇关于如何合并csv文件的连续行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!