您可以使用 Python 正则表达式从偏移量向后搜索吗? [英] Can you search backwards from an offset using a Python regular expression?

查看:85
本文介绍了您可以使用 Python 正则表达式从偏移量向后搜索吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定一个字符串以及该字符串中的字符偏移量,我可以使用 Python 正则表达式向后搜索吗?

Given a string, and a character offset within that string, can I search backwards using a Python regular expression?

我试图解决的实际问题是在字符串中的特定偏移量处获取匹配短语,但我必须匹配该偏移量之前的第一个实例.

The actual problem I'm trying to solve is to get a matching phrase at a particular offset within a string, but I have to match the first instance before that offset.

在我有一个长度为一个符号的正则表达式(例如:单词边界)的情况下,我使用了一种反转字符串的解决方案.

In a situation where I have a regex that's one symbol long (ex: a word boundary), I'm using a solution where I reverse the string.

my_string = "Thanks for looking at my question, StackOverflow."
offset = 30
boundary = re.compile(r'\b')
end = boundary.search(my_string, offset)
end_boundary = end.start()
end_boundary

输出:33

end = boundary.search(my_string[::-1], len(my_string) - offset - 1)
start_boundary = len(my_string) - end.start()
start_boundary

输出:25

my_string[start_boundary:end_boundary]

输出:'问题'

但是,如果我有一个可能涉及多个字符的更复杂的正则表达式,这种反向"技术将不起作用.例如,如果我想匹配出现在指定偏移量之前的第一个ing"实例:

However, this "reverse" technique won't work if I have a more complicated regular expression that may involve multiple characters. For example, if I wanted to match the first instance of "ing" that appears before a specified offset:

my_new_string = "Looking feeding dancing prancing"
offset = 16 # on the word dancing
m = re.match(r'(.*?ing)', my_new_string) # Except looking backwards

理想的输出:喂食

我可能会使用其他方法(将文件分成几行,然后向后遍历行),但向后使用正则表达式似乎是一个概念上更简单的解决方案.

I can likely use other approaches (split the file up into lines, and iterate through the lines backwards) but using a regular expression backwards seems like a conceptually-simpler solution.

推荐答案

使用正向后视确保单词前至少有 30 个字符:

Using positive lookbehind to make sure there are at least 30 characters before a word:

# re like: r'.*?(\w+)(?<=.{30})'
m = re.match(r'.*?(\w+)(?<=.{%d})' % (offset), my_string)
if m: print m.group(1)
else: print "no match"

对于另一个例子,负面回顾可能会有所帮助:

For the other example negative lookbehind may help:

my_new_string = "Looking feeding dancing prancing"
offset = 16
m = re.match(r'.*(\b\w+ing)(?<!.{%d})' % offset, my_new_string)
if m: print m.group(1)

哪个首先贪婪匹配任何字符但回溯直到它无法向后匹配16个字符((?<!.{16})).

which first greedy matches any character but backtracks until it fails to match 16 characters backwards ((?<!.{16})).

这篇关于您可以使用 Python 正则表达式从偏移量向后搜索吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆