python正则表达式不匹配带有re.match和re.MULTILINE标志的文件内容 [英] python regular expression not matching file contents with re.match and re.MULTILINE flag

查看:100
本文介绍了python正则表达式不匹配带有re.match和re.MULTILINE标志的文件内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在读取文件,并将其内容存储为多行字符串.然后,我遍历从Django查询中获得的一些值,以基于查询结果值运行正则表达式.我的正则表达式似乎应该可以正常工作,并且如果我复制查询返回的值也可以正常工作,但是由于某些原因,当所有部分都以这种方式结束时,匹配不正确

I'm reading in a file and storing its contents as a multiline string. Then I loop through some values I get from a django query to run regexes based on the query results values. My regex seems like it should be working, and works if I copy the values returned by the query, but for some reason isn't matching when all the parts are working together that ends like this

我的代码是:

with open("/path_to_my_file") as myfile:
    data=myfile.read()

#read saved settings then write/overwrite them into the config
items = MyModel.objects.filter(some_id="s100009")
for item in items:
    regexString = "^\s*"+item.feature_key+":"

    print regexString #to verify its what I want it to be, ie debug
    pq = re.compile(regexString, re.M)

    if pq.match(data):
        #do stuff

所以基本上我的问题是正则表达式不匹配.当我将文件内容复制到一个大的旧字符串中,并复制 print regexString 行打印的值时,它确实匹配了,所以我在想一些深奥的python/django问题上(或者可能不是像python那样深奥的不是我的第一语言).

So basically my problem is that the regex isn't matching. When I copy the file contents into a big old string, and copy the value(s) printed by the print regexString line, it does match, so I'm thinking theres some esoteric python/django thing going on (or maybe not so esoteric as python isnt my first language).

例如, print regexString 的输出为:

^ \ s * productDetailOn:

文件内容:

    productDetailOn:true,
    allOff:false,
    trendingWidgetOn:true,
    trendingWallOn:true,
    searchResultOn:false,
    bannersOn:true,
    homeWidgetOn:true,
}

运行Python 2.7.另外,转储了item.feature和data的类型,并且都是unicode.不确定这是否重要?无论如何,在工作了几个小时后,我开始将头从桌子上摔下来,因此能提供任何帮助.干杯!

Running Python 2.7. Also, dumped the types of both item.feature and data, and both were unicode. Not sure if that matters? Anyway, I'm starting to hit my head off the desk after working this for a couple hours, so any help is appreciated. Cheers!

推荐答案

根据文档, re.match 永远不允许在 line 的开头进行搜索:

According to documentation, re.match never allows searching at the beginning of a line:

请注意,即使在 MULTILINE 模式下, re.match()也将仅在字符串的开头而不是每行的开头进行匹配.

Note that even in MULTILINE mode, re.match() will only match at the beginning of the string and not at the beginning of each line.

您需要使用 re.search :

regexString = r"^\s*"+item.feature_key+":"
pq = re.compile(regexString, re.M)
if pq.search(data):

关于原始字符串( r"^ \ s +" )的小注释:在这种情况下,它等效于"\ s +" ,因为没有 \ s 转义序列(例如 \ r \ n ),因此,Python将其视为原始字符串文字.不过,始终使用Python中的原始字符串文字(以及其他语言中的相应符号)来声明正则表达式模式也更安全.

A small note on the raw string (r"^\s+"): in this case, it is equivalent to "\s+" because there is no \s escape sequence (like \r or \n), thus, Python treats it as a raw string literal. Still, it is safer to always declare regex patterns with raw string literals in Python (and with corresponding notations in other languages, too).

这篇关于python正则表达式不匹配带有re.match和re.MULTILINE标志的文件内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆