在Python中,如何解析表示一组关键字参数的字符串,以使顺序无关紧要 [英] In Python, how to parse a string representing a set of keyword arguments such that the order does not matter

查看:82
本文介绍了在Python中,如何解析表示一组关键字参数的字符串,以使顺序无关紧要的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在写一个类RecurringInterval,该类基于 dateutil.rrule 对象-表示时间间隔.我为它定义了一个自定义的,易读的__str__方法,并且还想定义一个parse方法(类似于

I'm writing a class RecurringInterval which - based on the dateutil.rrule object - represents a recurring interval in time. I have defined a custom, human-readable __str__ method for it and would like to also define a parse method which (similar to the rrulestr() function) parses the string back into an object.

这是parse方法以及与此相关的一些测试用例:

Here is the parse method and some test cases to go with it:

import re
from dateutil.rrule import FREQNAMES
import pytest

class RecurringInterval(object):
    freq_fmt = "{freq}"
    start_fmt = "from {start}"
    end_fmt = "till {end}"
    byweekday_fmt = "by weekday {byweekday}"
    bymonth_fmt = "by month {bymonth}"

    @classmethod
    def match_pattern(cls, string):
        SPACES = r'\s*'

        freq_names = [freq.lower() for freq in FREQNAMES] + [freq.title() for freq in FREQNAMES]        # The frequencies may be either lowercase or start with a capital letter
        FREQ_PATTERN = '(?P<freq>{})?'.format("|".join(freq_names))

        # Start and end are required (their regular expressions match 1 repetition)
        START_PATTERN = cls.start_fmt.format(start=SPACES + r'(?P<start>.+?)')
        END_PATTERN = cls.end_fmt.format(end=SPACES + r'(?P<end>.+?)')

        # The remaining tokens are optional (their regular expressions match 0 or 1 repetitions)
        BYWEEKDAY_PATTERN = cls.optional(cls.byweekday_fmt.format(byweekday=SPACES + r'(?P<byweekday>.+?)'))
        BYMONTH_PATTERN = cls.optional(cls.bymonth_fmt.format(bymonth=SPACES + r'(?P<bymonth>.+?)'))

        PATTERN = SPACES + FREQ_PATTERN \
                + SPACES + START_PATTERN \
                + SPACES + END_PATTERN \
                + SPACES + BYWEEKDAY_PATTERN \
                + SPACES + BYMONTH_PATTERN \
                + SPACES + "$"                  # The character '$' is needed to make the non-greedy regular expressions parse till the end of the string

        return re.match(PATTERN, string).groupdict()

    @staticmethod
    def optional(pattern):
        '''Encloses the given regular expression in an optional group (i.e., one that matches 0 or 1 repetitions of the original regular expression).'''
        return '({})?'.format(pattern)  


'''Tests'''
def test_match_pattern_with_byweekday_and_bymonth():
    string = "Weekly from 2017-11-03 15:00:00 till 2017-11-03 16:00:00 by weekday Monday, Tuesday by month January, February"

    groups = RecurringInterval.match_pattern(string)
    assert groups['freq'] == "Weekly"
    assert groups['start'].strip() == "2017-11-03 15:00:00"
    assert groups['end'].strip() == "2017-11-03 16:00:00"
    assert groups['byweekday'].strip() == "Monday, Tuesday"
    assert groups['bymonth'].strip() == "January, February"

def test_match_pattern_with_bymonth_and_byweekday():
    string = "Weekly from 2017-11-03 15:00:00 till 2017-11-03 16:00:00 by month January, February by weekday Monday, Tuesday "

    groups = RecurringInterval.match_pattern(string)
    assert groups['freq'] == "Weekly"
    assert groups['start'].strip() == "2017-11-03 15:00:00"
    assert groups['end'].strip() == "2017-11-03 16:00:00"
    assert groups['byweekday'].strip() == "Monday, Tuesday"
    assert groups['bymonth'].strip() == "January, February"


if __name__ == "__main__":
    # pytest.main([__file__])
    pytest.main([__file__+"::test_match_pattern_with_byweekday_and_bymonth"])       # This passes
    # pytest.main([__file__+"::test_match_pattern_with_bymonth_and_byweekday"])     # This fails

尽管如果您以正确"的顺序指定参数,解析器就可以工作,但是它是不灵活的",因为它不允许以任意顺序给出可选参数.这就是第二次测试失败的原因.

Although the parser works if you specify the arguments in the 'right' order, it is 'inflexible' in that it doesn't allow the optional arguments to be given in arbitrary order. This is why the second test fails.

如何使解析器以任何顺序解析可选"字段,以使两个测试均通过? (我曾想过用正则表达式的所有排列组成一个迭代器,并在每个正则表达式上尝试re.match,但这似乎不是一个很好的解决方案).

What would be a way to make the parser parse the 'optional' fields in any order, such that both tests pass? (I was thinking of making an iterator with all permutations of the regular expressions and trying re.match on each one, but this does not seem like an elegant solution).

推荐答案

在这一点上,您的语言变得越来越复杂,是时候放弃正则表达式并学习如何使用适当的解析库了.我使用 pyparsing 进行了整合,并对其进行了大量注释,以尝试解释发生了什么,但是如果有任何不清楚的地方,请询问,我会尽力解释.

At this point, your language is getting complex enough that it's time to ditch regular expressions and learn how to use a proper parsing library. I threw this together using pyparsing, and I've annotated it heavily to try and explain what's going on, but if anything's unclear do ask and I'll try to explain.

from pyparsing import Regex, oneOf, OneOrMore

# Boring old constants, I'm sure you know how to fill these out...
months      = ['January', 'February']
weekdays    = ['Monday', 'Tuesday']
frequencies = ['Daily', 'Weekly']

# A datetime expression is anything matching this regex. We could split it down
# even further to get day, month, year attributes in our results object if we felt
# like it
datetime_expr = Regex(r'(\d{4})-(\d\d?)-(\d\d?) (\d{2}):(\d{2}):(\d{2})')

# A from or till expression is the word "from" or "till" followed by any valid datetime
from_expr = 'from' + datetime_expr.setResultsName('from_')
till_expr = 'till' + datetime_expr.setResultsName('till')

# A range expression is a from expression followed by a till expression
range_expr = from_expr + till_expr

# A weekday is any old weekday
weekday_expr = oneOf(weekdays)
month_expr = oneOf(months)
frequency_expr = oneOf(frequencies)

# A by weekday expression is the words "by weekday" followed by one or more weekdays
by_weekday_expr = 'by weekday' + OneOrMore(weekday_expr).setResultsName('weekdays')
by_month_expr = 'by month' + OneOrMore(month_expr).setResultsName('months')

# A recurring interval, then, is a frequency, followed by a range, followed by
# a weekday and a month, in any order
recurring_interval = frequency_expr + range_expr + (by_weekday_expr & by_month_expr)

# Let's parse!
if __name__ == '__main__':
    res = recurring_interval.parseString('Daily from 1111-11-11 11:00:00 till 1111-11-11 12:00:00 by weekday Monday by month January February')

    # Note that setResultsName causes everything to get packed neatly into
    # attributes for us, so we can pluck all the bits and pieces out with no
    # difficulty at all
    print res
    print res.from_
    print res.till
    print res.weekdays
    print res.months

这篇关于在Python中,如何解析表示一组关键字参数的字符串,以使顺序无关紧要的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆