重构pyparsing多线程日志文件的解析结果 [英] Restructuring pyparsing parse results of multithreaded log file

查看:92
本文介绍了重构pyparsing多线程日志文件的解析结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个多线程进程的日志文件,如下所示:

I have a log file of a multithreaded process which looks like this:

<timestamp_in> <first_function_call_input> <thread:1>
    input_parameter_1:     value
    input_parameter_2:     value

<timestamp_in> <another_function_call_input> <thread:2>
    input_parameters:      values
<timestamp_out> <another_function_call_output> <thread:2>
    output_parameters:     values

<timestamp_out> <first_function_call_output> <thread:1>
    output_parameters:     values

在我的解析结果变量中,我希望将一个函数调用的输入和输出信息配对在一起,例如:

In my parse results variable I would like to have the input and output information of one function call paired together, for example like this:

>>> print(parse_results.dump())
  -[0]:
       -function: first_function
       -thread: 1
       -timestamp_in: ...
       -timestamp_out: ...
       -input_parameters:
             [0]:
                  -parameter_name: input_parameter_1
                  -parameter_value: value
             [1]:
                  -parameter_name: input_parameter_2
                  -parameter_value: value
       -output_parameters:
             [0]: ...
             ...
  -[1]:
       -function: another_function
       -thread: 2
       ...

在解析时是否可以直接重构parse_results的方法,因此以后不必重构结果了吗?也许有一些解析动作?还是仅自己解析输入部分和输出部分,然后按线程,时间戳和功能对它们进行排序,然后将输入部分和输出部分缝合到一个新对象中会更容易吗?

Is there a way to restructure the parse_results directly while parsing, so I don't have to restructure the results afterwards? Maybe with some parse actions? Or would it be way easier to just parse the input-parts and the output-parts by themselves, then sort them by thread, timestamp, and function and stitch the input-parts and output-parts together in a new object?

感谢您的帮助!

修改:
在分别解析输入部分和输出部分之后,我将对它们进行排序,这似乎更容易.但是,我仍然想知道是否以及如何重组解析结果实例.说我有以下语法和测试字符串:


I'm going to go do the sorting of the input-parts and output-parts after parsing them seperately, that seems way easier. However, I am still wondering if and how it is possible to restructure a parse results instance. Say I have the following grammar and test string:

from pyparsing import *

ParserElement.inlineLiteralsUsing(Suppress)
key_val_lines = OneOrMore(Group(Word(alphas)('key') + ':' + Word(nums)('val')))('parameters')

special_key_val_lines = OneOrMore(Group(Word(printables)('key') + ':' + Word(alphas)('val')))('special_parameters')

log = OneOrMore(Group(key_val_lines | special_key_val_lines))('contents').setDebug()

test_string ='''
foo             : 1
bar             : 2
special_key1!   : wow
another_special : abc
normalAgain     : 3'''

parse_results = log.parseString(test_string).dump()
print(parse_results)

这将输出以下内容:

- contents: [[['foo', '1'], ['bar', '2']], [['special_key1!', 'wow'], ['another_special', 'abc']], [['normalAgain', '3']]]
  [0]:
    [['foo', '1'], ['bar', '2']]
    - parameters: [['foo', '1'], ['bar', '2']]
      [0]:
        ['foo', '1']
        - key: 'foo'
        - val: '1'
      [1]:
        ['bar', '2']
        - key: 'bar'
        - val: '2'
  [1]:
    [['special_key1!', 'wow'], ['another_special', 'abc']]
    - special_parameters: [['special_key1!', 'wow'], ['another_special', 'abc']]
      [0]:
        ['special_key1!', 'wow']
        - key: 'special_key1!'
        - val: 'wow'
      [1]:
        ['another_special', 'abc']
        - key: 'another_special'
        - val: 'abc'
  [2]:
    [['normalAgain', '3']]
    - parameters: [['normalAgain', '3']]
      [0]:
        ['normalAgain', '3']
        - key: 'normalAgain'
        - val: '3'

如何修改解析器的语法,使parse_results.contents[2].parameters[0]成为parse_results.contents[0].parameters[3]?

How can I modify the grammar of my parser in such a way that parse_results.contents[2].parameters[0] will instead become parse_results.contents[0].parameters[3]?

推荐答案

纯粹是对在何处画线的判断,我已经用两种样式编写了解析器.

Purely a judgment call on where to draw the line on this, and I have written parsers in both styles.

在这种特殊情况下,我的直觉告诉我,如果将解析器和解析动作集中在对单个日志条目的各个部分进行分组,转换和命名,然后使用单独的方法进行重组,它将使代码更清晰.根据您的各种分组策略.我的理由是,日志消息的结构已经有些复杂,因此解析器将有足够的工作来将每个消息提取为统一的形式.另外,您的分组策略可能会有所发展(需要收集在一个较小的时间范围内的项目,而不仅仅是精确的时间戳匹配项),并且在单独的后处理方法中进行操作可以对这些更改进行本地化.

In this particular case, my intuition tells me that it will make for clearer code if you focus your parser and parse actions on grouping, converting, and naming the parts of the individual log entries, and then use a separate method to reorganize them based on your various grouping strategies. My reasoning is that the log message structure is already somewhat complex, and so your parser will have enough work to do to pull out each message into a unified form. Also, your grouping strategies may evolve a bit (need to gather items that are within some small time window, not just exact timestamp matches), and doing this in a separate post-processing method would localize these changes.

从测试的角度来看,这还使您可以与解析代码分开来测试重组代码,也许带有一系列dict或namedtuple,它们可以模拟来自单独日志记录的解析结果.

From a testing perspective, this would also allow you to test the restructuring code separately from the parsing code, perhaps with a list of dicts or namedtuples that would simulate the parsed results from the separate log records.

tl; dr-在这种情况下,我将采用后处理方法对已解析的日志记录进行最终排序/重新组织.

tl;dr - For this situation, I'd go with the post-processing method for the final sorting/reorganizing of your parsed log records.

要就地修改解析结果,请定义一个采用单个参数(我通常将其命名为tokens)的解析动作,然后使用典型的列表或dict修饰符就地进行修改:

To modify the parse results in place, define a parse action that takes a single argument, which I typically name tokens, and modify in place using typical list or dict mutators:

def rearrange(tokens):
    # mutate tokens in place
    tokens.contents[0].parameters.append(tokens.contents[2].parameters[0])

log.addParseAction(rearrange)

如果您返回None(如本例所示),则传入的令牌结构将保留为要返回的令牌结构.如果返回非None值,则新的返回值将替换解析器输出中的给定标记.这是整数解析器将解析的字符串转换为实际整数的方式,或者日期/时间解析器将解析的字符串转换为Python datetime s的方式.

If you return None (as in this example), then the tokens structure that was passed in is retained as the token structure to be returned. If you return a non-None value, then the new return value replaces the given tokens in the parser output. This is how integer parsers convert the parsed string to actual integers, or date/time parsers convert the parsed strings to Python datetimes.

这篇关于重构pyparsing多线程日志文件的解析结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆