Python 2.6 + str.format()和正则表达式 [英] Python 2.6+ str.format() and regular expressions

查看:129
本文介绍了Python 2.6 + str.format()和正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 str.format()是在Python 2.6和Python 3中格式化字符串的新标准。使用<$ c $时遇到了一个问题c> str.format()与正则表达式。



我写了一个正则表达式来返回下面所有单个级别的域一个指定的域名或指定的域名下的两个级别的域名,如果下面的第二级别是www ...



假设指定的域名是delivery.com,正则表达式应该返回a.delivery.com,b.delivery.com,www.c.delivery.com ...但它不应该返回xadelivery.com。

  import re 

str1 =www.pizza.delivery.com
str2 =w.pizza.delivery.com
str3 =pizza.delivery.com

if(re.match('^(w {3} \。)?([0-9A-Za-z-] + \。) {1} delivery.com $',str1):print'String 1 matches!'
if(re.match('^(w {3} \。)?([0-9A-Za-z - )+ \。){1} delivery.com $',str2):print'String 2 matches!'
if(re.ma tch('^(w {3} \。)?([0-9A-Za-z-] + \。){1} delivery.com $',str3):print'String 3 matches!'

运行这个应该给出结果:

 字符串1匹配! 
字符串3匹配!

现在,问题是当我尝试使用str.format动态替换delivery.com ... (b)(b)(b)(b)(b) '){1} {domainName} $'。format(domainName ='delivery.com'),str1):print'String 1 matches!'

这似乎失败了,因为 str.format()期望 {3} {1} 是函数的参数(我假设)

我可以使用+运算符连接字符串

$ $ $ $ $ $ $ $ $(^ {w}} { [0-9A-Za-z-] + \。){1}'+ domainName +'$'

问题归结为,当字符串(通常是正则表达式)具有 {n}时,是否可以使用 str.format()

解决方案

你首先需要格式化字符串,然后使用正则表达式,这真的不值得把所有东西都放进去

 >>>单行,转义是通过加倍大括号完成的。 pat ='^(w {{3}} \。)?([0-9A-Za-z-] + \。){{1}} {domainName} $'.format(domainName ='delivery。 com')
>>> pat
'^(w {3} \\。)?([0-9A-Za-z - ] + \\。){1} delivery.com $'
> ;>> re.match(pat,str1)

另外, re.match 匹配在字符串的开头,如果使用 re.match ^ c $ c>,但如果您使用 re.search ,则需要 ^ 。 b
$ b

请注意,正则表达式中的 {1} 是相当多的。


Using str.format() is the new standard for formatting strings in Python 2.6, and Python 3. I've run into an issue when using str.format() with regular expressions.

I've written a regular expression to return all domains that are a single level below a specified domain or any domains that are 2 levels below the domain specified, if the 2nd level below is www...

Assuming the specified domain is delivery.com, my regex should return a.delivery.com, b.delivery.com, www.c.delivery.com ... but it should not return x.a.delivery.com.

import re

str1 = "www.pizza.delivery.com"
str2 = "w.pizza.delivery.com"
str3 = "pizza.delivery.com"

if (re.match('^(w{3}\.)?([0-9A-Za-z-]+\.){1}delivery.com$', str1): print 'String 1 matches!'
if (re.match('^(w{3}\.)?([0-9A-Za-z-]+\.){1}delivery.com$', str2): print 'String 2 matches!'
if (re.match('^(w{3}\.)?([0-9A-Za-z-]+\.){1}delivery.com$', str3): print 'String 3 matches!'

Running this should give the result:

String 1 matches!
String 3 matches!

Now, the problem is when I try to replace delivery.com dynamically using str.format...

if (re.match('^(w{3}\.)?([0-9A-Za-z-]+\.){1}{domainName}$'.format(domainName = 'delivery.com'), str1): print 'String 1 matches!'

This seems to fail, because the str.format() expects the {3} and {1} to be parameters to the function. (I'm assuming)

I could concatenate the string using + operator

'^(w{3}\.)?([0-9A-Za-z-]+\.){1}' + domainName + '$'

The question comes down to, is it possible to use str.format() when the string (usually regex) has "{n}" within it?

解决方案

you first would need to format string and then use regex. It really doesn't worth it to put everything into a single line. Escaping is done by doubling the curly braces:

>>> pat= '^(w{{3}}\.)?([0-9A-Za-z-]+\.){{1}}{domainName}$'.format(domainName = 'delivery.com')
>>> pat
'^(w{3}\\.)?([0-9A-Za-z-]+\\.){1}delivery.com$'
>>> re.match(pat, str1)

Also, re.match is matching at the beginning of the string, you don't have to put ^ if you use re.match, you need ^ if you're using re.search, however.

Please note, that {1} in regex is rather redundant.

这篇关于Python 2.6 + str.format()和正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆