需要简单的解析能力 [英] need simple parsing ability

查看:62
本文介绍了需要简单的解析能力的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

[python 2.3.3,x86 linux]

对于我的应用程序的每次运行,我都有一组已知的(< 100)晶圆名称。

名称有时只是整数,有时是一个短的字符串,并且

有时是一个短字符串后跟一个整数,例如:


5,6,7,8, 9,bar,foo_6,foo_7,foo_8,foo_9,foo_10,foo_11

我需要读取这些子集的用户输入。用户将键入由逗号分隔的

名称(带有可选的空格),但是

也可以是由两个整数之间的短划线表示的序列,例如:


" 9-11"意思是9,10,11

" foo_11-13"意思是foo_11,foo_12和foo_13。

" foo_9-11"意思是foo_9,foo_10,foo_11,或

" bar09-11"意思是bar09,bar10,bar11


(是的,我必须处理带或不带前导零的整数)

[我会宣布反序列像" foo_11-9"无效]

所以样本输入可能是:


9,foo7-9,2-4,xxx表示9,foo7,foo8,foo9,2 ,3,4,xxx


结果列表名称的顺序并不重要;不管怎么说,我还有

来对它们进行排序。


不需要花哨的错误恢复;一个无效的输入字符串将被屏幕上的一声恼怒的哔哔声从屏幕上擦除。


有人能建议一个干净的方法吗?我不介意

安装和导入一些解析包,只要我的代码

使用它清晰简单。表演不是问题。

- George Young

-

众神不仅仅是吗? 哦,不,孩子。

如果他们是我们会变成什么样? (CSL)

[python 2.3.3, x86 linux]
For each run of my app, I have a known set of (<100) wafer names.
Names are sometimes simply integers, sometimes a short string, and
sometimes a short string followed by an integer, e.g.:

5, 6, 7, 8, 9, bar, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11

I need to read user input of a subset of these. The user will type a
set of names separated by commas (with optional white space), but there
may also be sequences indicated by a dash between two integers, e.g.:

"9-11" meaning 9,10,11
"foo_11-13" meaning foo_11, foo_12, and foo_13.
"foo_9-11" meaning foo_9,foo_10,foo_11, or
"bar09-11" meaning bar09,bar10,bar11

(Yes, I have to deal with integers with and without leading zeros)
[I''ll proclaim inverse sequences like "foo_11-9" invalid]
So a sample input might be:

9,foo7-9,2-4,xxx meaning 9,foo7,foo8,foo9,2,3,4,xxx

The order of the resultant list of names is not important; I have
to sort them later anyway.

Fancy error recovery is not needed; an invalid input string will be
peremptorily wiped from the screen with an annoyed beep.

Can anyone suggest a clean way of doing this? I don''t mind
installing and importing some parsing package, as long as my code
using it is clear and simple. Performance is not an issue.
-- George Young
--
"Are the gods not just?" "Oh no, child.
What would become of us if they were?" (CSL)

推荐答案

COMMA =","

OPT_WS =" [\ t] *" ;

STEM ="([a-zA-Z _] *)"

NUMBER ="([0-9] +)"

OPT_NUMBER = NUM​​BER +"?"

OPT_SECOND_NUMBER ="(?: - " + NUMBER +")?"


导入re

splitter = re.compile(COMMA + OPT_WS).split

打印`STEM + OPT_NUMBER + OPT_SECOND_NUMBER`

parser = re.compile(STEM + OPT_NUMBER + OPT_SECOND_NUMBER)。匹配

def expand(stem,n0,n1):

如果不是n1:

如果n0:

收益率"%s%s" %(干,n0)

否则:

收益率

回报

l = len(n0)

n0 = int(n0,10)

n1 = int(n1,10)

for i in range(n0,n1 +1):

收益率%s%0 * d %(stem,l,i)


def parse_string(line):

items = splitter(line)

parsed_items = [解析器(i)for i in items]

for i,pi in zip(items,parsed_items):

if i is None:

引发ValueError,无效项目:%r %i

stem = pi.group(1)

n0 = pi.group(2)

n1 = pi.group(3)

如果n1而不是n0:

引发ValueError,无效项目:%r %i

for j in expand(stem,n0,n1):

yield j


def test():

s =" 9,foo7-9,bar_09-12,2-4,垃圾邮件"

print s,list(parse_string(s))


----- BEGIN PGP SIGNATURE -----

版本:GnuPG v1.2.1(GNU / Linux)


iD8DBQFA9 / gvJd01MZaTXX0RAt7rAJ9AKextRdnmcRfQ + y50vJN4pm3RKwCf QE + c

iffKuKaIvlCedpMysL4vqkY =

= / dUJ

----- END PGP SIGNATURE -----

COMMA = ","
OPT_WS = "[ \t]*"
STEM = "([a-zA-Z_]*)"
NUMBER = "([0-9]+)"
OPT_NUMBER = NUMBER + "?"
OPT_SECOND_NUMBER = "(?:-" + NUMBER + ")?"

import re
splitter = re.compile(COMMA + OPT_WS).split
print `STEM + OPT_NUMBER + OPT_SECOND_NUMBER`
parser = re.compile(STEM + OPT_NUMBER + OPT_SECOND_NUMBER).match

def expand(stem, n0, n1):
if not n1:
if n0:
yield "%s%s" % (stem, n0)
else:
yield stem
return
l = len(n0)
n0 = int(n0, 10)
n1 = int(n1, 10)

for i in range(n0, n1+1):
yield "%s%0*d" % (stem, l, i)

def parse_string(line):
items = splitter(line)
parsed_items = [parser(i) for i in items]
for i, pi in zip(items, parsed_items):
if i is None:
raise ValueError, "Invalid item: %r" % i
stem = pi.group(1)
n0 = pi.group(2)
n1 = pi.group(3)
if n1 and not n0:
raise ValueError, "Invalid item: %r" % i
for j in expand(stem, n0, n1):
yield j

def test():
s = "9,foo7-9,bar_09-12,2-4,spam"
print s, list(parse_string(s))

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFA9/gvJd01MZaTXX0RAt7rAJ9AKextRdnmcRfQ+y50vJN4pm3RKwCf QE+c
iffKuKaIvlCedpMysL4vqkY=
=/dUJ
-----END PGP SIGNATURE-----


2004年7月16日星期五,乔治·杨写道:
On Fri, 16 Jul 2004, george young wrote:
我需要阅读用户输入的一个这些的子集。用户将键入一组由逗号分隔的名称(带有可选的空格),但
也可能是由两个整数之间的短划线表示的序列,例如:

" 9-11"意思是9,10,11
foo_11-13意思是foo_11,foo_12和foo_13。
" foo_9-11"意思是foo_9,foo_10,foo_11,或
" bar09-11"意思是bar09,bar10,bar11

(是的,我必须处理带有和不带前导零的整数)
[我会宣布反序列如foo_11-9无效]
所以样本输入可能是:

9,foo7-9,2-4,xxx意思是9,foo7,foo8,foo9,2,3,4,xxx

结果列表名称的顺序并不重要;我无论如何都要对它们进行排序。
I need to read user input of a subset of these. The user will type a
set of names separated by commas (with optional white space), but there
may also be sequences indicated by a dash between two integers, e.g.:

"9-11" meaning 9,10,11
"foo_11-13" meaning foo_11, foo_12, and foo_13.
"foo_9-11" meaning foo_9,foo_10,foo_11, or
"bar09-11" meaning bar09,bar10,bar11

(Yes, I have to deal with integers with and without leading zeros)
[I''ll proclaim inverse sequences like "foo_11-9" invalid]
So a sample input might be:

9,foo7-9,2-4,xxx meaning 9,foo7,foo8,foo9,2,3,4,xxx

The order of the resultant list of names is not important; I have
to sort them later anyway.




以下应该可以解决这个问题,只需使用内置的

重新包装:


---


导入重新


def扩展(模式) :

r = re.search(''\d + -\d +



The following should do the trick, using nothing more than the built-in
re package:

---

import re

def expand(pattern):
r = re.search(''\d+-\d+


'',模式)
如果r是
无:

收益率模式

返回

s,e = r.group()。split('' - '')
$ x $ b for n in xrange(int(s),int(e)+1):

yield pattern [:r.start()] + str(n)
< br $>
def expand_list(pattern_list):

返回[w for pattern in pattern_list.split('','')

for w in expand(模式)]


print expand_list(''9,foo7-9,2-4,xxx'')


---


如果你想让语法更宽松一点,那就替换

" pattern_list.split('','')"在expand_list()中使用

" re.split(''\ * *,\ s *'',pattern_list)"。这将允许空格包围

逗号。


请注意,因为这使用了生成器,所以在
$之前它不适用于Pythons b $ b 2.3。


希望这会有所帮助!

'',pattern)
if r is None:
yield pattern
return
s,e = r.group().split(''-'')
for n in xrange(int(s),int(e)+1):
yield pattern[:r.start()]+str(n)

def expand_list(pattern_list):
return [ w for pattern in pattern_list.split('','')
for w in expand(pattern) ]

print expand_list(''9,foo7-9,2-4,xxx'')

---

If you want to let the syntax be a little more lenient, replace
"pattern_list.split('','')" in expand_list() with
"re.split(''\s*,\s*'',pattern_list)". This will allow spaces to surround
commas.

Note that because this uses generators, it won''t work on Pythons prior to
2.3.

Hope this helps!


这篇关于需要简单的解析能力的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆