正则表达式 - 旧正则表达式模块与重新模块 [英] Regular Expression - old regex module vs. re module

查看:71
本文介绍了正则表达式 - 旧正则表达式模块与重新模块的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,


我很难将以下regex.compile模式转换为新的re.compile格式。

regsub.sub()与re.sub()之间也存在差异。


任何人都可以伸出援助之手吗?

import regsub

import regex


import re#<<需要转换到这个模块


.....


"""将perl样式格式符号系统转换为printf标记。


取一个字符串并用perl风格替换计算的printf标记

格式符号系统。


例如:


###。##产量%6.2f

########产量%8d

<< ;<<<产量%-5s

"""

exponentPattern = regex.compile(''\(^ \ | [^ \\#] \\ \\)\(#+ \。#+ \ * \ * \ * \ * \)'')

floatPattern = regex.compile(''\(^ \ | [^ \\#] \)\(#+ \。#+ \)'')

integerPattern = regex.compile(''\(^ \ | [^ \\#] \)\(## + \)'')

leftJustifiedStringPattern = regex.compile(''\(^ \ | [ ^ \\<] \)\(<<< + \)'')

rightJustifiedStringPattern = regex.compile(''\(^ \ | [^ \\>] \)\(>> + \)'')


而1:#处理所有整数字段

print(" Testing Integer)

if integerPattern.search(s)< 0:break

print(" Integer Match:",integerPattern.search(s).span())

#i1,i2 = integerPattern.regs [2 ]

i1,i2 = integerPattern.search(s).span()

width_total = i2 - i1

f =''%'' +`width_total` +''d''

#s = regsub.sub(integerPattern,''\\\\''+ f,s)

s = integerPattern.sub(f,s)


提前致谢!


史蒂夫

Hi All,

I''m having a tough time converting the following regex.compile patterns
into the new re.compile format. There is also a differences in the
regsub.sub() vs. re.sub()

Could anyone lend a hand?
import regsub
import regex

import re # << need conversion to this module

.....

"""Convert perl style format symbology to printf tokens.

Take a string and substitute computed printf tokens for perl style
format symbology.

For example:

###.## yields %6.2f
######## yields %8d
<<<<< yields %-5s
"""
exponentPattern = regex.compile(''\(^\|[^\\#]\)\(#+\.#+\*\*\*\*\)'')
floatPattern = regex.compile(''\(^\|[^\\#]\)\(#+\.#+\)'')
integerPattern = regex.compile(''\(^\|[^\\#]\)\(##+\)'')
leftJustifiedStringPattern = regex.compile(''\(^\|[^\\<]\)\(<<+\)'')
rightJustifiedStringPattern = regex.compile(''\(^\|[^\\>]\)\(>>+\)'')

while 1: # process all integer fields
print("Testing Integer")
if integerPattern.search(s) < 0: break
print("Integer Match : ", integerPattern.search(s).span() )
# i1 , i2 = integerPattern.regs[2]
i1 , i2 = integerPattern.search(s).span()
width_total = i2 - i1
f = ''%''+`width_total`+''d''
# s = regsub.sub(integerPattern, ''\\1''+f, s)
s = integerPattern.sub(f, s)

Thanks in advance!

Steve

推荐答案

在文章< 11 ********************** @ d56g2000cwd.googlegroups .com> ;,

Steve< st **** @ cruzio.com>写道:
In article <11**********************@d56g2000cwd.googlegroups .com>,
Steve <st****@cruzio.com> wrote:
大家好,

我很难将以下regex.compile模式转换为新的re.compile格式。
regsub.sub()与re.sub()之间也存在差异

任何人都可以伸出援助之手吗?

import regsub
import regex

import re#<<需要转换到此模块

....

"""将perl样式格式符号系统转换为printf标记。

字符串和替换计算的printf标记用于perl样式
格式符号系统。

例如:

###。## yield%6.2f
# #######产生%8d
<<<<< yield%-5s
"""
Hi All,

I''m having a tough time converting the following regex.compile patterns
into the new re.compile format. There is also a differences in the
regsub.sub() vs. re.sub()

Could anyone lend a hand?
import regsub
import regex

import re # << need conversion to this module

....

"""Convert perl style format symbology to printf tokens.

Take a string and substitute computed printf tokens for perl style
format symbology.

For example:

###.## yields %6.2f
######## yields %8d
<<<<< yields %-5s
"""




也许不是最优的,但这可以按要求处理。注意

所有浮点数必须在替换任何整数模式之前完成。


=============== ===========

#!/ usr / local / bin / python

import re

"""将perl样式格式符号系统转换为printf标记。

取一个字符串并用perl样式替换计算的printf标记

格式符号系统。


例如:


###。## yield%6.2f

##### ###产量%8d

<<<<<产量%-5s

"""

#处理没有整数或没有小数字符的情况

floatPattern = re.compile(r''(?<!\\)(#+ \。(#*)| \。(#+))'')

integerPattern = re .compile(r''(?<![\\。])(#+)(?![。#])'')

leftJustifiedStringPattern = re.compile(r' '(?<!\\)(< +)'')

rightJustifiedStringPattern = re.compile(r''(?<!\\)(> + )'')

def float_sub(matchobj):

#fractional part可能在()[1]或groups()[2]中

如果matchobj.groups()[1]不是None:

return" %%% d。%df" %(len(matchobj.groups()[0]),

len(matchobj.groups()[1]))

else:

return" %%% d。%df" %(len(matchobj.groups()[0]),

len(matchobj.groups()[2]))

def unperl_format(s):

changed_things = 1

而change_things:

#lather,冲洗并重复直到没有新的事情发生

changed_things = 0


mat_obj = leftJustifiedStringPattern.search(s)

如果mat_obj:

s = re.sub(leftJustifiedStringPattern," %% - %ds"%

len(mat_obj.groups()[0]),s,1)

changed_things = 1


mat_obj = rightJustifiedStringPattern.search(s)

if mat_obj:

s = re.sub(rightJustifiedStringPattern," %%% ds"%

len(mat_obj.groups()[0]),s,1)

changed_things = 1


#必须在整体之前完成所有浮动

mat_obj = floatPattern.search(s)

如果mat_obj:

s = re.sub(floatPattern,float_sub,s,1)

changed_things = 1

#不要落入国内代码

继续


mat_obj = integerPattern.search(s)

如果mat_obj:

s = re.sub(integerPattern," %%% dd" ; %len(mat_obj.groups()[0]),

s,1)

changed_things = 1

返回s


if __name__ ==''__ main__'':

testarray = [" integer:####,integer#integer at end#",

" float ####。## no decimals ###。 no int。### at end ###。",

" Left string<<<<<<<短左字符串<",

" right string>>>>>>短右字符串>",

" escaped chars \\ #### \\ ####。## \\< \\< ;<<<在testarray中为s的


print(" Testing:%s")&b;<<<"         ;%s)

print"结果:%s" %unperl_format(s)

打印


======================


运行此项给出


测试:整数:####,整数结束时整数#

结果:整数:%4d,整数%1d整数结束%1d


测试:浮动####。##无小数###。 no int。### at end ###。

结果:float%7.2f无小数%4.0f no int%4.3f at end%4.0f


测试:左字符串<<<<<<<短左字符串<

结果:左字符串%-6s短左字符串%-1s


测试:右字符串>>>> >>短右字符串>

结果:右字符串%6s短右字符串%1s


测试:转义字符\ #### \ ## ##。## \< \<<<< \> \><<<

结果:转义字符\#%3d \#%6.2f \< \<% - 3s \ > \>% - 3s

-

Jim Segrave(je*@jes-2.demon.nl)



Perhaps not optimal, but this processes things as requested. Note that
all floats have to be done before any integer patterns are replaced.

==========================
#!/usr/local/bin/python

import re

"""Convert perl style format symbology to printf tokens.
Take a string and substitute computed printf tokens for perl style
format symbology.

For example:

###.## yields %6.2f
######## yields %8d
<<<<< yields %-5s
"""
# handle cases where there''s no integer or no fractional chars
floatPattern = re.compile(r''(?<!\\)(#+\.(#*)|\.(#+))'')
integerPattern = re.compile(r''(?<![\\.])(#+)(?![.#])'')
leftJustifiedStringPattern = re.compile(r''(?<!\\)(<+)'')
rightJustifiedStringPattern = re.compile(r''(?<!\\)(>+)'')

def float_sub(matchobj):
# fractional part may be in either groups()[1] or groups()[2]
if matchobj.groups()[1] is not None:
return "%%%d.%df" % (len(matchobj.groups()[0]),
len(matchobj.groups()[1]))
else:
return "%%%d.%df" % (len(matchobj.groups()[0]),
len(matchobj.groups()[2]))
def unperl_format(s):
changed_things = 1
while changed_things:
# lather, rinse and repeat until nothing new happens
changed_things = 0

mat_obj = leftJustifiedStringPattern.search(s)
if mat_obj:
s = re.sub(leftJustifiedStringPattern, "%%-%ds" %
len(mat_obj.groups()[0]), s, 1)
changed_things = 1

mat_obj = rightJustifiedStringPattern.search(s)
if mat_obj:
s = re.sub(rightJustifiedStringPattern, "%%%ds" %
len(mat_obj.groups()[0]), s, 1)
changed_things = 1

# must do all floats before ints
mat_obj = floatPattern.search(s)
if mat_obj:
s = re.sub(floatPattern, float_sub, s, 1)
changed_things = 1
# don''t fall through to the int code
continue

mat_obj = integerPattern.search(s)
if mat_obj:
s = re.sub(integerPattern, "%%%dd" % len(mat_obj.groups()[0]),
s, 1)
changed_things = 1
return s

if __name__ == ''__main__'':
testarray = ["integer: ####, integer # integer at end #",
"float ####.## no decimals ###. no int .### at end ###.",
"Left string <<<<<< short left string <",
"right string >>>>>> short right string >",
"escaped chars \\#### \\####.## \\<\\<<<< \\>\\><<<"]
for s in testarray:
print("Testing: %s" % s)
print "Result: %s" % unperl_format(s)
print

======================

Running this gives

Testing: integer: ####, integer # integer at end #
Result: integer: %4d, integer %1d integer at end %1d

Testing: float ####.## no decimals ###. no int .### at end ###.
Result: float %7.2f no decimals %4.0f no int %4.3f at end %4.0f

Testing: Left string <<<<<< short left string <
Result: Left string %-6s short left string %-1s

Testing: right string >>>>>> short right string >
Result: right string %6s short right string %1s

Testing: escaped chars \#### \####.## \<\<<<< \>\><<<
Result: escaped chars \#%3d \#%6.2f \<\<%-3s \>\>%-3s

--
Jim Segrave (je*@jes-2.demon.nl)


" Steve" < ST **** @ cruzio.com>在消息中写道

news:11 ********************** @ d56g2000cwd.googlegr oups.com ...
"Steve" <st****@cruzio.com> wrote in message
news:11**********************@d56g2000cwd.googlegr oups.com...
大家好,

我很难将以下regex.compile模式转换为新的re.compile格式。
regsub.sub()与re.sub()之间也存在差异

任何人都可以伸出援手吗?
Hi All,

I''m having a tough time converting the following regex.compile patterns
into the new re.compile format. There is also a differences in the
regsub.sub() vs. re.sub()

Could anyone lend a hand?




不是一个重新解决方案,但是pyparsing使得一个易于理解的程序。

TransformString只需要扫描一次字符串 -

reals-before-ints测试是

格式化程序变量的定义。


Pyparsing'的项目wiki位于 http://pyparsing.wikispaces.com


- Paul


-------------------

来自pyparsing import *


" ;"

读取Perl样式的格式化占位符并替换为

正确的Python%x字符串interp格式化程序


# ##### - > %6d

##。### - > %6.3f

<<<<< - > %-5s



Not an re solution, but pyparsing makes for an easy-to-follow program.
TransformString only needs to scan through the string once - the
"reals-before-ints" testing is factored into the definition of the
formatters variable.

Pyparsing''s project wiki is at http://pyparsing.wikispaces.com.

-- Paul

-------------------
from pyparsing import *

"""
read Perl-style formatting placeholders and replace with
proper Python %x string interp formatters

###### -> %6d
##.### -> %6.3f
<<<<< -> %-5s

> - > %5s
> -> %5s




"""


#设置模式为匹配 - Word对象匹配字符组

#由Word构造函数中的字符组成;结合力

#元素相邻,没有介入的空白

#(注意在realFormat中使用结果名称,以便于访问

#小数位子串)

intFormat = Word(&#;")

realFormat =组合(Word(&#;")+"。" +

Word(#)。setResultsName(" decPlaces"))

leftString = Word("<")

rightString = Word(">")

#define每个解析操作 - 匹配的令牌是第三个

#arg解析操作;解析操作将用解析操作返回的
#值替换传入的令牌

intFormat.setParseAction(lambda s,l,toks:" %%% dd"% len(toks [0]))

realFormat.setParseAction(lambda s,l,toks:" %%% d。%df"%

(len(toks) [0]),len(toks.decPlaces)))

leftString.setParseAction(lambda s,l,toks:" %% - %ds"%len(toks [0]))

rightString.setParseAction(lambda s,l,toks:" %%% ds"%len(toks [0]))


#collection所有格式化程序单个语法

# - 注意实数在整齐前检查

formatters = rightString | leftString | realFormat | intFormat


#设置我们的测试字符串,并使用转换字符串调用解析操作

#对任何匹配的标记

testString ="""

这是一个包含

整数的字符串:#### ################

浮动:#####。####。#######。#

左对齐字符串:<<<< ;<<<< << <

右对齐字符串:>>>>>>>>>>> >> >

int句末:####。

"""

print formatters.transformString(testString)


-------------------

打印:


这是一个包含

整数的字符串:%4d%1d%15d

浮点数:%7.1f%10.6f%3.1f

左对齐字符串:%-8s%-2s%-1s

右对齐字符串:%10s%2s%1s

句末:%4d 。



"""

# set up patterns to be matched - Word objects match character groups
# made up of characters in the Word constructor; Combine forces
# elements to be adjacent with no intervening whitespace
# (note use of results name in realFormat, for easy access to
# decimal places substring)
intFormat = Word("#")
realFormat = Combine(Word("#")+"."+
Word("#").setResultsName("decPlaces"))
leftString = Word("<")
rightString = Word(">")

# define parse actions for each - the matched tokens are the third
# arg to parse actions; parse actions will replace the incoming tokens with
# value returned from the parse action
intFormat.setParseAction( lambda s,l,toks: "%%%dd" % len(toks[0]) )
realFormat.setParseAction( lambda s,l,toks: "%%%d.%df" %
(len(toks[0]),len(toks.decPlaces)) )
leftString.setParseAction( lambda s,l,toks: "%%-%ds" % len(toks[0]) )
rightString.setParseAction( lambda s,l,toks: "%%%ds" % len(toks[0]) )

# collect all formatters into a single "grammar"
# - note reals are checked before ints
formatters = rightString | leftString | realFormat | intFormat

# set up our test string, and use transform string to invoke parse actions
# on any matched tokens
testString = """
This is a string with
ints: #### # ###############
floats: #####.# ###.###### #.#
left-justified strings: <<<<<<<< << <
right-justified strings: >>>>>>>>>> >> >
int at end of sentence: ####.
"""
print formatters.transformString( testString )

-------------------
Prints:

This is a string with
ints: %4d %1d %15d
floats: %7.1f %10.6f %3.1f
left-justified strings: %-8s %-2s %-1s
right-justified strings: %10s %2s %1s
int at end of sentence: %4d.


文章< eP **************** @ tornado.texas.rr.com> ,

Paul McGuire< pt *** @ austin.rr._bogus_.com>写道:
In article <eP****************@tornado.texas.rr.com>,
Paul McGuire <pt***@austin.rr._bogus_.com> wrote:
不是一个重新解决方案,但pyparsing使一个易于遵循的程序。
TransformString只需要扫描一次字符串 -
"实数先于整数"测试是
格式化程序变量定义的因素。

Pyparsing'的项目维基位于 http://pyparsing.wikispaces.com
Not an re solution, but pyparsing makes for an easy-to-follow program.
TransformString only needs to scan through the string once - the
"reals-before-ints" testing is factored into the definition of the
formatters variable.

Pyparsing''s project wiki is at http://pyparsing.wikispaces.com.




如果指定为###的浮动失败。或。###,它分别输出一个整数

格式和小数点。它也忽略了\#

应该阻止''''被包含在格式中。


-

Jim Segrave(je*@jes-2.demon.nl)



If fails for floats specified as ###. or .###, it outputs an integer
format and the decimal point separately. It also ignores \# which
should prevent the ''#'' from being included in a format.

--
Jim Segrave (je*@jes-2.demon.nl)


这篇关于正则表达式 - 旧正则表达式模块与重新模块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆