正则表达式 - 旧正则表达式模块与重新模块 [英] Regular Expression - old regex module vs. re module
问题描述
大家好,
我很难将以下regex.compile模式转换为新的re.compile格式。
regsub.sub()与re.sub()之间也存在差异。
任何人都可以伸出援助之手吗?
import regsub
import regex
import re#<<需要转换到这个模块
.....
"""将perl样式格式符号系统转换为printf标记。
取一个字符串并用perl风格替换计算的printf标记
格式符号系统。
例如:
###。##产量%6.2f
########产量%8d
<< ;<<<产量%-5s
"""
exponentPattern = regex.compile(''\(^ \ | [^ \\#] \\ \\)\(#+ \。#+ \ * \ * \ * \ * \)'')
floatPattern = regex.compile(''\(^ \ | [^ \\#] \)\(#+ \。#+ \)'')
integerPattern = regex.compile(''\(^ \ | [^ \\#] \)\(## + \)'')
leftJustifiedStringPattern = regex.compile(''\(^ \ | [ ^ \\<] \)\(<<< + \)'')
rightJustifiedStringPattern = regex.compile(''\(^ \ | [^ \\>] \)\(>> + \)'')
而1:#处理所有整数字段
print(" Testing Integer)
if integerPattern.search(s)< 0:break
print(" Integer Match:",integerPattern.search(s).span())
#i1,i2 = integerPattern.regs [2 ]
i1,i2 = integerPattern.search(s).span()
width_total = i2 - i1
f =''%'' +`width_total` +''d''
#s = regsub.sub(integerPattern,''\\\\''+ f,s)
s = integerPattern.sub(f,s)
提前致谢!
史蒂夫
Hi All,
I''m having a tough time converting the following regex.compile patterns
into the new re.compile format. There is also a differences in the
regsub.sub() vs. re.sub()
Could anyone lend a hand?
import regsub
import regex
import re # << need conversion to this module
.....
"""Convert perl style format symbology to printf tokens.
Take a string and substitute computed printf tokens for perl style
format symbology.
For example:
###.## yields %6.2f
######## yields %8d
<<<<< yields %-5s
"""
exponentPattern = regex.compile(''\(^\|[^\\#]\)\(#+\.#+\*\*\*\*\)'')
floatPattern = regex.compile(''\(^\|[^\\#]\)\(#+\.#+\)'')
integerPattern = regex.compile(''\(^\|[^\\#]\)\(##+\)'')
leftJustifiedStringPattern = regex.compile(''\(^\|[^\\<]\)\(<<+\)'')
rightJustifiedStringPattern = regex.compile(''\(^\|[^\\>]\)\(>>+\)'')
while 1: # process all integer fields
print("Testing Integer")
if integerPattern.search(s) < 0: break
print("Integer Match : ", integerPattern.search(s).span() )
# i1 , i2 = integerPattern.regs[2]
i1 , i2 = integerPattern.search(s).span()
width_total = i2 - i1
f = ''%''+`width_total`+''d''
# s = regsub.sub(integerPattern, ''\\1''+f, s)
s = integerPattern.sub(f, s)
Thanks in advance!
Steve
推荐答案
在文章< 11 ********************** @ d56g2000cwd.googlegroups .com> ;,
Steve< st **** @ cruzio.com>写道:
In article <11**********************@d56g2000cwd.googlegroups .com>,
Steve <st****@cruzio.com> wrote:
大家好,
我很难将以下regex.compile模式转换为新的re.compile格式。
regsub.sub()与re.sub()之间也存在差异
任何人都可以伸出援助之手吗?
import regsub
import regex
import re#<<需要转换到此模块
....
"""将perl样式格式符号系统转换为printf标记。
字符串和替换计算的printf标记用于perl样式
格式符号系统。
例如:
###。## yield%6.2f
# #######产生%8d
<<<<< yield%-5s
"""
Hi All,
I''m having a tough time converting the following regex.compile patterns
into the new re.compile format. There is also a differences in the
regsub.sub() vs. re.sub()
Could anyone lend a hand?
import regsub
import regex
import re # << need conversion to this module
....
"""Convert perl style format symbology to printf tokens.
Take a string and substitute computed printf tokens for perl style
format symbology.
For example:
###.## yields %6.2f
######## yields %8d
<<<<< yields %-5s
"""
也许不是最优的,但这可以按要求处理。注意
所有浮点数必须在替换任何整数模式之前完成。
=============== ===========
#!/ usr / local / bin / python
import re
>
"""将perl样式格式符号系统转换为printf标记。
取一个字符串并用perl样式替换计算的printf标记
格式符号系统。
例如:
###。## yield%6.2f
##### ###产量%8d
<<<<<产量%-5s
"""
#处理没有整数或没有小数字符的情况
floatPattern = re.compile(r''(?<!\\)(#+ \。(#*)| \。(#+))'')
integerPattern = re .compile(r''(?<![\\。])(#+)(?![。#])'')
leftJustifiedStringPattern = re.compile(r' '(?<!\\)(< +)'')
rightJustifiedStringPattern = re.compile(r''(?<!\\)(> + )'')
def float_sub(matchobj):
#fractional part可能在()[1]或groups()[2]中
如果matchobj.groups()[1]不是None:
return" %%% d。%df" %(len(matchobj.groups()[0]),
len(matchobj.groups()[1]))
else:
return" %%% d。%df" %(len(matchobj.groups()[0]),
len(matchobj.groups()[2]))
def unperl_format(s):
changed_things = 1
而change_things:
#lather,冲洗并重复直到没有新的事情发生
changed_things = 0
mat_obj = leftJustifiedStringPattern.search(s)
如果mat_obj:
s = re.sub(leftJustifiedStringPattern," %% - %ds"%
len(mat_obj.groups()[0]),s,1)
changed_things = 1
mat_obj = rightJustifiedStringPattern.search(s)
if mat_obj:
s = re.sub(rightJustifiedStringPattern," %%% ds"%
len(mat_obj.groups()[0]),s,1)
changed_things = 1
#必须在整体之前完成所有浮动
mat_obj = floatPattern.search(s)
如果mat_obj:
s = re.sub(floatPattern,float_sub,s,1)
changed_things = 1
#不要落入国内代码
继续
mat_obj = integerPattern.search(s)
如果mat_obj:
s = re.sub(integerPattern," %%% dd" ; %len(mat_obj.groups()[0]),
s,1)
changed_things = 1
返回s
if __name__ ==''__ main__'':
testarray = [" integer:####,integer#integer at end#",
" float ####。## no decimals ###。 no int。### at end ###。",
" Left string<<<<<<<短左字符串<",
" right string>>>>>>短右字符串>",
" escaped chars \\ #### \\ ####。## \\< \\< ;<<<在testarray中为s的
:
print(" Testing:%s")&b;<<<" ;%s)
print"结果:%s" %unperl_format(s)
打印
======================
运行此项给出
测试:整数:####,整数结束时整数#
结果:整数:%4d,整数%1d整数结束%1d
测试:浮动####。##无小数###。 no int。### at end ###。
结果:float%7.2f无小数%4.0f no int%4.3f at end%4.0f
测试:左字符串<<<<<<<短左字符串<
结果:左字符串%-6s短左字符串%-1s
测试:右字符串>>>> >>短右字符串>
结果:右字符串%6s短右字符串%1s
测试:转义字符\ #### \ ## ##。## \< \<<<< \> \><<<
结果:转义字符\#%3d \#%6.2f \< \<% - 3s \ > \>% - 3s
-
Jim Segrave(je*@jes-2.demon.nl)
Perhaps not optimal, but this processes things as requested. Note that
all floats have to be done before any integer patterns are replaced.
==========================
#!/usr/local/bin/python
import re
"""Convert perl style format symbology to printf tokens.
Take a string and substitute computed printf tokens for perl style
format symbology.
For example:
###.## yields %6.2f
######## yields %8d
<<<<< yields %-5s
"""
# handle cases where there''s no integer or no fractional chars
floatPattern = re.compile(r''(?<!\\)(#+\.(#*)|\.(#+))'')
integerPattern = re.compile(r''(?<![\\.])(#+)(?![.#])'')
leftJustifiedStringPattern = re.compile(r''(?<!\\)(<+)'')
rightJustifiedStringPattern = re.compile(r''(?<!\\)(>+)'')
def float_sub(matchobj):
# fractional part may be in either groups()[1] or groups()[2]
if matchobj.groups()[1] is not None:
return "%%%d.%df" % (len(matchobj.groups()[0]),
len(matchobj.groups()[1]))
else:
return "%%%d.%df" % (len(matchobj.groups()[0]),
len(matchobj.groups()[2]))
def unperl_format(s):
changed_things = 1
while changed_things:
# lather, rinse and repeat until nothing new happens
changed_things = 0
mat_obj = leftJustifiedStringPattern.search(s)
if mat_obj:
s = re.sub(leftJustifiedStringPattern, "%%-%ds" %
len(mat_obj.groups()[0]), s, 1)
changed_things = 1
mat_obj = rightJustifiedStringPattern.search(s)
if mat_obj:
s = re.sub(rightJustifiedStringPattern, "%%%ds" %
len(mat_obj.groups()[0]), s, 1)
changed_things = 1
# must do all floats before ints
mat_obj = floatPattern.search(s)
if mat_obj:
s = re.sub(floatPattern, float_sub, s, 1)
changed_things = 1
# don''t fall through to the int code
continue
mat_obj = integerPattern.search(s)
if mat_obj:
s = re.sub(integerPattern, "%%%dd" % len(mat_obj.groups()[0]),
s, 1)
changed_things = 1
return s
if __name__ == ''__main__'':
testarray = ["integer: ####, integer # integer at end #",
"float ####.## no decimals ###. no int .### at end ###.",
"Left string <<<<<< short left string <",
"right string >>>>>> short right string >",
"escaped chars \\#### \\####.## \\<\\<<<< \\>\\><<<"]
for s in testarray:
print("Testing: %s" % s)
print "Result: %s" % unperl_format(s)
print
======================
Running this gives
Testing: integer: ####, integer # integer at end #
Result: integer: %4d, integer %1d integer at end %1d
Testing: float ####.## no decimals ###. no int .### at end ###.
Result: float %7.2f no decimals %4.0f no int %4.3f at end %4.0f
Testing: Left string <<<<<< short left string <
Result: Left string %-6s short left string %-1s
Testing: right string >>>>>> short right string >
Result: right string %6s short right string %1s
Testing: escaped chars \#### \####.## \<\<<<< \>\><<<
Result: escaped chars \#%3d \#%6.2f \<\<%-3s \>\>%-3s
--
Jim Segrave (je*@jes-2.demon.nl)
" Steve" < ST **** @ cruzio.com>在消息中写道
news:11 ********************** @ d56g2000cwd.googlegr oups.com ...
"Steve" <st****@cruzio.com> wrote in message
news:11**********************@d56g2000cwd.googlegr oups.com...
大家好,
我很难将以下regex.compile模式转换为新的re.compile格式。
regsub.sub()与re.sub()之间也存在差异
任何人都可以伸出援手吗?
Hi All,
I''m having a tough time converting the following regex.compile patterns
into the new re.compile format. There is also a differences in the
regsub.sub() vs. re.sub()
Could anyone lend a hand?
不是一个重新解决方案,但是pyparsing使得一个易于理解的程序。
TransformString只需要扫描一次字符串 -
reals-before-ints测试是
格式化程序变量的定义。
Pyparsing'的项目wiki位于 http://pyparsing.wikispaces.com 。
- Paul
-------------------
来自pyparsing import *
" ;"
读取Perl样式的格式化占位符并替换为
正确的Python%x字符串interp格式化程序
# ##### - > %6d
##。### - > %6.3f
<<<<< - > %-5s
Not an re solution, but pyparsing makes for an easy-to-follow program.
TransformString only needs to scan through the string once - the
"reals-before-ints" testing is factored into the definition of the
formatters variable.
Pyparsing''s project wiki is at http://pyparsing.wikispaces.com.
-- Paul
-------------------
from pyparsing import *
"""
read Perl-style formatting placeholders and replace with
proper Python %x string interp formatters
###### -> %6d
##.### -> %6.3f
<<<<< -> %-5s
> - > %5s
> -> %5s
"""
#设置模式为匹配 - Word对象匹配字符组
#由Word构造函数中的字符组成;结合力
#元素相邻,没有介入的空白
#(注意在realFormat中使用结果名称,以便于访问
#小数位子串)
intFormat = Word(&#;")
realFormat =组合(Word(&#;")+"。" +
Word(#)。setResultsName(" decPlaces"))
leftString = Word("<")
rightString = Word(">")
#define每个解析操作 - 匹配的令牌是第三个
#arg解析操作;解析操作将用解析操作返回的
#值替换传入的令牌
intFormat.setParseAction(lambda s,l,toks:" %%% dd"% len(toks [0]))
realFormat.setParseAction(lambda s,l,toks:" %%% d。%df"%
(len(toks) [0]),len(toks.decPlaces)))
leftString.setParseAction(lambda s,l,toks:" %% - %ds"%len(toks [0]))
rightString.setParseAction(lambda s,l,toks:" %%% ds"%len(toks [0]))
#collection所有格式化程序单个语法
# - 注意实数在整齐前检查
formatters = rightString | leftString | realFormat | intFormat
#设置我们的测试字符串,并使用转换字符串调用解析操作
#对任何匹配的标记
testString ="""
这是一个包含
整数的字符串:#### ################
浮动:#####。####。#######。#
左对齐字符串:<<<< ;<<<< << <
右对齐字符串:>>>>>>>>>>> >> >
int句末:####。
"""
print formatters.transformString(testString)
-------------------
打印:
这是一个包含
整数的字符串:%4d%1d%15d
浮点数:%7.1f%10.6f%3.1f
左对齐字符串:%-8s%-2s%-1s
右对齐字符串:%10s%2s%1s
句末:%4d 。
"""
# set up patterns to be matched - Word objects match character groups
# made up of characters in the Word constructor; Combine forces
# elements to be adjacent with no intervening whitespace
# (note use of results name in realFormat, for easy access to
# decimal places substring)
intFormat = Word("#")
realFormat = Combine(Word("#")+"."+
Word("#").setResultsName("decPlaces"))
leftString = Word("<")
rightString = Word(">")
# define parse actions for each - the matched tokens are the third
# arg to parse actions; parse actions will replace the incoming tokens with
# value returned from the parse action
intFormat.setParseAction( lambda s,l,toks: "%%%dd" % len(toks[0]) )
realFormat.setParseAction( lambda s,l,toks: "%%%d.%df" %
(len(toks[0]),len(toks.decPlaces)) )
leftString.setParseAction( lambda s,l,toks: "%%-%ds" % len(toks[0]) )
rightString.setParseAction( lambda s,l,toks: "%%%ds" % len(toks[0]) )
# collect all formatters into a single "grammar"
# - note reals are checked before ints
formatters = rightString | leftString | realFormat | intFormat
# set up our test string, and use transform string to invoke parse actions
# on any matched tokens
testString = """
This is a string with
ints: #### # ###############
floats: #####.# ###.###### #.#
left-justified strings: <<<<<<<< << <
right-justified strings: >>>>>>>>>> >> >
int at end of sentence: ####.
"""
print formatters.transformString( testString )
-------------------
Prints:
This is a string with
ints: %4d %1d %15d
floats: %7.1f %10.6f %3.1f
left-justified strings: %-8s %-2s %-1s
right-justified strings: %10s %2s %1s
int at end of sentence: %4d.
文章< eP **************** @ tornado.texas.rr.com> ,
Paul McGuire< pt *** @ austin.rr._bogus_.com>写道:
In article <eP****************@tornado.texas.rr.com>,
Paul McGuire <pt***@austin.rr._bogus_.com> wrote:
不是一个重新解决方案,但pyparsing使一个易于遵循的程序。
TransformString只需要扫描一次字符串 -
"实数先于整数"测试是
格式化程序变量定义的因素。
Pyparsing'的项目维基位于 http://pyparsing.wikispaces.com 。
Not an re solution, but pyparsing makes for an easy-to-follow program.
TransformString only needs to scan through the string once - the
"reals-before-ints" testing is factored into the definition of the
formatters variable.
Pyparsing''s project wiki is at http://pyparsing.wikispaces.com.
如果指定为###的浮动失败。或。###,它分别输出一个整数
格式和小数点。它也忽略了\#
应该阻止''''被包含在格式中。
-
Jim Segrave(je*@jes-2.demon.nl)
If fails for floats specified as ###. or .###, it outputs an integer
format and the decimal point separately. It also ignores \# which
should prevent the ''#'' from being included in a format.
--
Jim Segrave (je*@jes-2.demon.nl)
这篇关于正则表达式 - 旧正则表达式模块与重新模块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!