我的第一个Python程序 - 词法分析器 [英] My first Python program -- a lexer
问题描述
您好,
对Python有用(而不是试用片段)来自教程)。它还没有完成,但我希望得到一些反馈 - 我是Python
新手,看来,使用Python,总有一个更简单的
比你想象的更好的方式。
###开始###
import re
类Lexer(对象):
def __init __(self,source,tokens):
self.source = re。 sub(r" \r?\ n | \\ n"," \ n",source)
self.tokens = tokens
self.offset = 0
self.result = []
self.line = 1
self._compile()
self._tokenize()
def _compile(self):
for name,regex in self.tokens.iteritems():
self.tokens [name] = re.compile(regex,re.M)
def _tokenize(self):
而self .offset< len(self.source):
的名字,正则表达式在self.tokens.iteritems():
match = regex.match(self.source,self.offset)
如果不匹配:继续
self.offset + = len(match.group(0))
self.result.append(( name,match,self.line))
self.line + = match.group(0).count(" \ n")
break
else:
引发异常(
''偏移%s''源代码中的语法错误''%
str(self .offset))
def __str __(自我):
返回" \ n" .join(
[" ; [L:%s] \t [O:%s] \t [%s] \t''%s''"%
(str(line),str (match.pos),name,match.group(0))
for name,match,line in self.result])
#测试示例
source = r"""
姓名:" Thomas",#只是评论
年龄:37
"""
令牌= {
''T_IDENTIFIER'':r''[A-Za-z _] [A-Za-z0-9 _] *'',
''T_NUMBER'':r''[+ - ]?\d +'',
''T_STRING'':r''"(?: \\。| [^ \\"])*" '',
''T_OPERATOR'':r''[=:,;]'',
''T_NEWLINE'':r''\ n' ',
''T_LWSP'':r''[\ t] +'',
''T_COMMENT'':r''(?:\ #| //).*$''}
打印Lexer(来源,代币)
### End ### < br $>
问候,
托马斯
-
Ce n''est pas parce qu'' ils sont nontnombreuxàavoirtort qu''ils ont raison!
(Coluche)
''}
打印Lexer(来源,代币)
###结束###
问候,
Thomas
-
Ce n''est pas parce qu''i ls sont nontnombreuxàavoirtort qu''ils ont raison!
(Coluche)
Thomas Mlynarczyk< th **** @ mlynarczyk -webdesign.dewrites:
你好,
我开始用Python写一个词法分析器 - 我第一次尝试做
一些对Python有用的东西(而不是试试来自
教程的片段)。它尚未完成,但我希望得到一些反馈 -
我是Python的新手,而且似乎使用Python,总有一个简单的
和比你想象的更好的方式。
加入约翰的评论,我不会'没有源作为
Lexer对象的成员,但作为tokenise()方法的一个参数(我会将b
公开)。 tokenise方法将返回您当前调用的内容
self.result。所以它会像这样使用。
>> mylexer = Lexer(tokens)
mylexer.tokenise(来源)
#Later:
< blockquote class =post_quotes>
>> mylexer.tokenise(another_source)
-
Arnaud
Arnaud Delobelle schrieb:
添加对于John的评论,我不会拥有作为
Lexer对象的成员的来源,而是作为tokenise()方法的参数(我会这样做)
公开)。 tokenise方法将返回您当前调用的内容
self.result。所以它会像这样使用。
>>> mylexer = Lexer(tokens)
mylexer.tokenise(source)
mylexer.tokenise(another_source)
稍后阶段,我打算让源标记不是一次全部,
但令牌代表,及时解析器(尚未写入)
访问下一个令牌:
token = mylexer.next(''FOO_TOKEN'')
如果不是令牌:提高异常(''FOO令牌预期。'')
#继续用令牌做一些有用的东西
其次在哪里( )将返回下一个标记(并提前一个内部
指针)*如果*它是一个FOO_TOKEN,否则它将返回False。这个
的方式,正则表达式匹配的总数将会减少:只有那个预期的b / bbb才会被试用。
但除此之外,经过反思,我认为你是对的,它确实比你建议的更合适。
谢谢你的支持反馈。
问候,
托马斯
-
Ce n' 'est pas parce qu''ils sontnombreuxàavoirtort qu''ils ont raison!
(Coluche)
Hello,
I started to write a lexer in Python -- my first attempt to do something
useful with Python (rather than trying out snippets from tutorials). It
is not complete yet, but I would like some feedback -- I''m a Python
newbie and it seems that, with Python, there is always a simpler and
better way to do it than you think.
### Begin ###
import re
class Lexer(object):
def __init__( self, source, tokens ):
self.source = re.sub( r"\r?\n|\r\n", "\n", source )
self.tokens = tokens
self.offset = 0
self.result = []
self.line = 1
self._compile()
self._tokenize()
def _compile( self ):
for name, regex in self.tokens.iteritems():
self.tokens[name] = re.compile( regex, re.M )
def _tokenize( self ):
while self.offset < len( self.source ):
for name, regex in self.tokens.iteritems():
match = regex.match( self.source, self.offset )
if not match: continue
self.offset += len( match.group(0) )
self.result.append( ( name, match, self.line ) )
self.line += match.group(0).count( "\n" )
break
else:
raise Exception(
''Syntax error in source at offset %s'' %
str( self.offset ) )
def __str__( self ):
return "\n".join(
[ "[L:%s]\t[O:%s]\t[%s]\t''%s''" %
( str( line ), str( match.pos ), name, match.group(0) )
for name, match, line in self.result ] )
# Test Example
source = r"""
Name: "Thomas", # just a comment
Age: 37
"""
tokens = {
''T_IDENTIFIER'' : r''[A-Za-z_][A-Za-z0-9_]*'',
''T_NUMBER'' : r''[+-]?\d+'',
''T_STRING'' : r''"(?:\\.|[^\\"])*"'',
''T_OPERATOR'' : r''[=:,;]'',
''T_NEWLINE'' : r''\n'',
''T_LWSP'' : r''[ \t]+'',
''T_COMMENT'' : r''(?:\#|//).*$'' }
print Lexer( source, tokens )
### End ###
Greetings,
Thomas
--
Ce n''est pas parce qu''ils sont nombreux à avoir tort qu''ils ont raison!
(Coluche)
'' }
print Lexer( source, tokens )
### End ###
Greetings,
Thomas
--
Ce n''est pas parce qu''ils sont nombreux à avoir tort qu''ils ont raison!
(Coluche)
Thomas Mlynarczyk <th****@mlynarczyk-webdesign.dewrites:
Hello,
I started to write a lexer in Python -- my first attempt to do
something useful with Python (rather than trying out snippets from
tutorials). It is not complete yet, but I would like some feedback --
I''m a Python newbie and it seems that, with Python, there is always a
simpler and better way to do it than you think.
Hi,
Adding to John''s comments, I wouldn''t have source as a member of the
Lexer object but as an argument of the tokenise() method (which I would
make public). The tokenise method would return what you currently call
self.result. So it would be used like this.
>>mylexer = Lexer(tokens)
mylexer.tokenise(source)
# Later:
>>mylexer.tokenise(another_source)
--
Arnaud
Arnaud Delobelle schrieb:
Adding to John''s comments, I wouldn''t have source as a member of the
Lexer object but as an argument of the tokenise() method (which I would
make public). The tokenise method would return what you currently call
self.result. So it would be used like this.
>>>mylexer = Lexer(tokens)
mylexer.tokenise(source)
mylexer.tokenise(another_source)
At a later stage, I intend to have the source tokenised not all at once,
but token by token, "just in time" when the parser (yet to be written)
accesses the next token:
token = mylexer.next( ''FOO_TOKEN'' )
if not token: raise Exception( ''FOO token expected.'' )
# continue doing something useful with token
Where next() would return the next token (and advance an internal
pointer) *if* it is a FOO_TOKEN, otherwise it would return False. This
way, the total number of regex matchings would be reduced: Only that
which is expected is "tried out".
But otherwise, upon reflection, I think you are right and it would
indeed be more appropriate to do as you suggest.
Thanks for your feedback.
Greetings,
Thomas
--
Ce n''est pas parce qu''ils sont nombreux à avoir tort qu''ils ont raison!
(Coluche)
这篇关于我的第一个Python程序 - 词法分析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!