Python:解析成对的日志文件 [英] Python: Parse log file for pairs of lines
问题描述
我有一个日志文件,需要针对特定事件进行解析.问题是我需要的数据来自成对的事件条目,每个条目都包含所需的数据片段.
I have a log file that I need to parse for specific events. The problem is the data I need comes from pairs of event entries that each hold pieces of the data needed.
例如,将有一行事件类型 = test 和一些数据,然后不久之后会有另一行事件类型 = test2 和更多数据.
For instance there will be a line with an event type = test with some data and then shortly after there is another line with an event type = test2 with some more data.
文件中可能有很多这些数据对的实例,也可能没有.
There may be many instances of these pairs of data in the file or none.
我需要做的是告诉代码,当它找到带有 event=test 的行时,还要查找 event=test2 的下一个实例,这通常在日志中的后面几行.
What I need to do is tell the code that when it finds a line with event=test then also look for the next instance of event=test2 which is usually a couple of lines later in the log.
这是数据文件的示例:
2020-08-25 03:36:56.006 INFO Panda HOOK: {"event":"keepalive","time":1600.0064477}
2020-08-25 03:37:01.006 INFO Panda HOOK: {"event":"keepalive","time":1605.0066958}
2020-08-25 03:37:06.004 INFO Panda HOOK: {"event":"keepalive","time":1610.004206}
2020-08-25 03:37:11.003 INFO Panda HOOK: {"event":"keepalive","time":1615.0032498}
2020-08-25 03:37:16.005 INFO Panda HOOK: {"event":"keepalive","time":1620.0056292}
2020-08-25 03:37:21.001 INFO Panda HOOK: {"event":"keepalive","time":1625.0011002}
2020-08-25 03:37:26.007 INFO Panda HOOK: {"event":"keepalive","time":1630.0073155}
2020-08-25 03:37:31.008 INFO Panda HOOK: {"event":"keepalive","time":1635.0086481}
2020-08-25 03:37:32.687 INFO Scripting: event:type=test,initiator=Abe Lincoln,place=Washinton,
2020-08-25 03:37:21.001 INFO Panda HOOK: {"event":"keepalive","time":1625.0011002}
2020-08-25 03:37:26.007 INFO Panda HOOK: {"event":"keepalive","time":1630.0073155}
2020-08-25 03:37:31.008 INFO Panda HOOK: {"event":"keepalive","time":1635.0086481}
2020-08-25 03:37:34.414 INFO Scripting: event:type=test2,t=25277.04,type=comment,
这是我必须得到第一行的一些代码 2020-08-25 03:37:32.687 INFO Scripting: event:type=test,initiator=Abe Lincoln,place=Washinton,
And here is some code that I have to get the first line 2020-08-25 03:37:32.687 INFO Scripting: event:type=test,initiator=Abe Lincoln,place=Washinton,
f = open('data.log', 'r')
lines = f.readlines()
test2Event = 'event:type=test2'
testEvent = 'event:type=test'
for string in lines:
if testEvent in string:
initPerson = string.split('initiator=')[1]
f = open('data.log', 'r')
lines = f.readlines()
test2Event = 'event:type=test2'
testEvent = 'event:type=test'
for string in lines:
if testEvent in string:
initPerson = string.split('initiator=')[1]
person = initPerson.split(',')[0]
print(person)
我收到此代码的错误以及我目前想要的结果.我不明白为什么,因为我使用了这个带有不同字符串的确切脚本来拆分没有问题.
I am getting an error with this code as well as my desired result to this point. I don't understand why, as I have used this exact script with a differnt string to split with no problems.
结果
Abe Lincoln
Traceback (most recent call last):
File "main.py", line 15, in <module>
initPerson = string.split('initiator=')[1]
IndexError: list index out of range
任何有关如何获取下一行数据的建议,以便我可以将数据组合到数据库或类似数据中,我们将不胜感激......以及有关为什么会出现错误消息的任何帮助,因为我没有看看是什么问题.
Any suggestions on how to get the next line of data so that I can combine the data for insertion into a db or similar would be appreciated...as well as any help with why the error message is happening because I do not see what the issue is.
代码和数据可在 https://onlinegdb.com/Hyuuj7Mmv
The code and data is avaiable for testing at https://onlinegdb.com/Hyuuj7Mmv
推荐答案
读取整个文件两次绝对是多余的.相反,在遍历文件时跟踪您之前完成的操作.
Reading the entire file twice is absolutely excessive. Instead, keep track of what you have done previously while traversing the file.
seen_test = False # state variable for keeping track of what you have done
init_person = None # note snake_case variable convention pro headlessCamelCase
with open('data.log', 'r') as f:
for lineno, line in enumerate(f, start=1):
if 'event:type=test,' in line:
if seen_test:
raise ValueError(
'line %i: type=test without test2: %s' % (
lineno, line))
init_person = line.split('initiator=')[1].split(',')[0]
seen_test = True
elif 'event:type=test2' in line:
if seen_test:
# ... do whatever you want with init_person
# maybe something like
result = line.rstrip('\n').split(',')
print('Test by %s got results %s' % (init_person, result[1:]))
else:
raise ValueError(
'line %i: type=test2 without test: %s' % (
lineno, line))
seen_test = False
enumerate
只是为了在出现故障时在错误信息中获取有用的行号;如果您确定该文件的格式始终良好,则可以将其删除.
The enumerate
is just to get a useful line number into the error message when there is a failure; if you are sure that the file is always well-formatted, maybe take that out.
如果 type=test
行不包含 initiator=
,这仍然会失败,但我们不知道在这种情况下做什么会有用,所以我我不想解决这个问题.
This will still fail if the type=test
line doesn't contain initiator=
but we have no idea what would be useful to do in that scenario so I'm not trying to tackle that.
演示:https://repl.it/repls/OverdueFruitfulComputergames#main.py一个>
这篇关于Python:解析成对的日志文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!