正则表达式:如何匹配字符串末尾的键值对序列 [英] Regex: How to match sequence of key-value pairs at end of string

查看：76 发布时间：2020/4/26 9:24:26 python regex key-value

本文介绍了正则表达式:如何匹配字符串末尾的键值对序列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试匹配出现在(长)字符串末尾的键值对.字符串看起来像(我替换为"\ n")

I am trying to match key-value pairs that appear at the end of (long) strings. The strings look like (I replaced the "\n")

my_str = "lots of blah
          key1: val1-words
          key2: val2-words
          key3: val3-words"

所以我希望匹配"key1:val1字"，"key2:val2字"和"key3:val3字".

so I expect matches "key1: val1-words", "key2: val2-words" and "key3: val3-words".

一组可能的键名是已知的.
并非所有可能的键都出现在每个字符串中.
每个字符串中至少出现两个键(如果这样更容易匹配).
val-word可以是几个单词.
键值对仅应在字符串末尾进行匹配.
我正在使用Python re模块.

我在想

re.compile('(?:tag1|tag2|tag3):')

加上一些前瞻性断言的东西将是一个解决方案.不过我做不到.我该怎么办?

plus some look-ahead assertion stuff would be a solution. I can't get it right though. How do I do?

谢谢.

/大卫

实际示例字符串:

my_str = u'ucourt métrage pour kino session volume 18\nThème: O sombres héros\nContraintes: sous titrés\nAuthor: nicoalabdou\nTags: wakatanka productions court métrage kino session humour cantat bertrand noir désir sombres héros mer medine marie trintignant femme droit des femmes nicoalabdou pute soumise\nPosted: 06 June 2009\nRating: 1.3\nVotes: 3'

基于Mikel的解决方案，我现在使用以下内容:

Based on Mikel's solution I am now using the following:


my_tags = ['\S+'] # gets all tags
my_tags = ['Tags','Author','Posted'] # selected tags
regex = re.compile(r'''
    \n                     # all key-value pairs are on separate lines
    (                      # start group to return
       (?:{0}):            # placeholder for tags to detect '\S+' == all
        \s                 # the space between ':' and value
       .*                  # the value
    )                      # end group to return
    '''.format('|'.join(my_tags)), re.VERBOSE)


regex.sub('',my_str) # return my_str without matching key-vaue lines
regex.findall(my_str) # return matched key-value lines

regex.sub('',my_str) # return my_str without matching key-vaue lines
regex.findall(my_str) # return matched key-value lines

推荐答案

负零宽度超前为(?!pattern).

在 re模块文档页面中部分提及.

(?!...)

如果...下一个不匹配，则进行匹配.这是一个否定的超前断言.例如，仅当Isaac(?！Asimov)后面不带"Asimov"时，它才会与"Isaac"匹配.

Matches if ... doesn’t match next. This is a negative lookahead assertion. For example, Isaac (?!Asimov) will match 'Isaac ' only if it’s not followed by 'Asimov'.

因此，您可以使用它来匹配某个键后的任意数量的单词，但不能使用(?!\S+:)\S+之类的键来匹配该键.

So you could use it to match any number of words after a key, but not a key using something like (?!\S+:)\S+.

完整的代码如下:

regex = re.compile(r'''
    [\S]+:                # a key (any word followed by a colon)
    (?:
    \s                    # then a space in between
        (?!\S+:)\S+       # then a value (any word not followed by a colon)
    )+                    # match multiple values if present
    ''', re.VERBOSE)

matches = regex.findall(my_str)

哪个给

['key1: val1-words ', 'key2: val2-words ', 'key3: val3-words']

如果您使用以下命令打印键/值:

If you print the key/values using:

for match in matches:
    print match

它将打印:

key1: val1-words
key2: val2-words
key3: val3-words

或者使用更新后的示例，它将打印:

Or using your updated example, it would print:

Thème: O sombres héros 
Contraintes: sous titrés 
Author: nicoalabdou 
Tags: wakatanka productions court métrage kino session humour cantat bertrand noir désir sombres héros mer medine marie trintignant femme droit des femmes nicoalabdou pute soumise 
Posted: 06 June 2009 
Rating: 1.3 
Votes: 3

您可以使用以下方法将每个键/值对转换成字典:

You could turn each key/value pair into a dictionary using something like this:

pairs = dict([match.split(':', 1) for match in matches])

这将使查找所需的键(和值)更加容易.

which would make it easier to look up only the keys (and values) you want.

更多信息:

Python re module documentation
Python Regular Expression HOWTO
Perl Regular Expression Reference "perlreref"

这篇关于正则表达式:如何匹配字符串末尾的键值对序列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

正则表达式:如何匹配字符串末尾的键值对序列 [英] Regex: How to match sequence of key-value pairs at end of string

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

正则表达式:如何匹配字符串末尾的键值对序列 [英] Regex: How to match sequence of key-value pairs at end of string

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭