我的第一个Python程序 - 词法分析器 [英] My first Python program -- a lexer

查看：326 发布时间：2019/6/5 10:13:55 python

本文介绍了我的第一个Python程序 - 词法分析器的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

您好，

对Python有用（而不是试用片段）来自教程）。它还没有完成，但我希望得到一些反馈 - 我是Python

新手，看来，使用Python，总有一个更简单的

比你想象的更好的方式。

###开始###

import re

类Lexer（对象）：

def __init __（self，source，tokens）：

self.source = re。 sub（r" \r？\ n | \\ n"，" \ n"，source）

self.tokens = tokens

self.offset = 0

self.result = []

self.line = 1

self._compile（）

self._tokenize（）

def _compile（self）：

for name，regex in self.tokens.iteritems（）：

self.tokens [name] = re.compile（regex，re.M）

def _tokenize（self）：

而self .offset< len（self.source）：

的名字，正则表达式在self.tokens.iteritems（）：

match = regex.match（self.source，self.offset）

如果不匹配：继续

self.offset + = len（match.group（0））

self.result.append（（ name，match，self.line））

self.line + = match.group（0）.count（" \ n"）

break

else：

引发异常（

''偏移％s''源代码中的语法错误''％

str（self .offset））

def __str __（自我）：

返回" \ n" .join（

[" ; [L：％s] \t [O：％s] \t [％s] \t''％s''"％

（str（line），str （match.pos），name，match.group（0））

for name，match，line in self.result]）

＃测试示例

source = r"""

姓名：" Thomas"，＃只是评论

年龄：37

"""

令牌= {

''T_IDENTIFIER''：r''[A-Za-z _] [A-Za-z0-9 _] *''，

''T_NUMBER''：r''[+ - ]？\d +''，

''T_STRING''：r''"（?: \\。| [^ \\"]）*" ''，

''T_OPERATOR''：r''[=：，;]''，

''T_NEWLINE''：r''\ n' '，

''T_LWSP''：r''[\ t] +''，

''T_COMMENT''：r''（？：\ ＃| //).*$''}

打印Lexer（来源，代币）

### End ### < br $>
问候，

托马斯

-

Ce n''est pas parce qu'' ils sont nontnombreuxàavoirtort qu''ils ont raison！

（Coluche）

解决方案

''}

打印Lexer（来源，代币）

###结束###

问候，

Thomas

-

Ce n''est pas parce qu''i ls sont nontnombreuxàavoirtort qu''ils ont raison！

（Coluche）

Thomas Mlynarczyk< th **** @ mlynarczyk -webdesign.dewrites：

你好，

我开始用Python写一个词法分析器 - 我第一次尝试做

一些对Python有用的东西（而不是试试来自

教程的片段）。它尚未完成，但我希望得到一些反馈 -

我是Python的新手，而且似乎使用Python，总有一个简单的

和比你想象的更好的方式。

加入约翰的评论，我不会'没有源作为

Lexer对象的成员，但作为tokenise（）方法的一个参数（我会将b
公开）。 tokenise方法将返回您当前调用的内容

self.result。所以它会像这样使用。

>> mylexer = Lexer（tokens）
mylexer.tokenise（来源）

＃Later：

< blockquote class =post_quotes>

>> mylexer.tokenise（another_source）

-

Arnaud

Arnaud Delobelle schrieb：

添加对于John的评论，我不会拥有作为

Lexer对象的成员的来源，而是作为tokenise（）方法的参数（我会这样做）
公开）。 tokenise方法将返回您当前调用的内容

self.result。所以它会像这样使用。

>>> mylexer = Lexer（tokens）
mylexer.tokenise（source）
mylexer.tokenise（another_source）

稍后阶段，我打算让源标记不是一次全部，

但令牌代表，及时解析器（尚未写入）

访问下一个令牌：

token = mylexer.next（''FOO_TOKEN''）

如果不是令牌：提高异常（''FOO令牌预期。''）

＃继续用令牌做一些有用的东西

其次在哪里（）将返回下一个标记（并提前一个内部

指针）*如果*它是一个FOO_TOKEN，否则它将返回False。这个

的方式，正则表达式匹配的总数将会减少：只有那个预期的b / bbb才会被试用。

但除此之外，经过反思，我认为你是对的，它确实比你建议的更合适。

谢谢你的支持反馈。

问候，

托马斯

-

Ce n' 'est pas parce qu''ils sontnombreuxàavoirtort qu''ils ont raison！

（Coluche）

Hello,

I started to write a lexer in Python -- my first attempt to do something
useful with Python (rather than trying out snippets from tutorials). It
is not complete yet, but I would like some feedback -- I''m a Python
newbie and it seems that, with Python, there is always a simpler and
better way to do it than you think.

### Begin ###

import re

class Lexer(object):
def __init__( self, source, tokens ):
self.source = re.sub( r"\r?\n|\r\n", "\n", source )
self.tokens = tokens
self.offset = 0
self.result = []
self.line = 1
self._compile()
self._tokenize()

def _compile( self ):
for name, regex in self.tokens.iteritems():
self.tokens[name] = re.compile( regex, re.M )

def _tokenize( self ):
while self.offset < len( self.source ):
for name, regex in self.tokens.iteritems():
match = regex.match( self.source, self.offset )
if not match: continue
self.offset += len( match.group(0) )
self.result.append( ( name, match, self.line ) )
self.line += match.group(0).count( "\n" )
break
else:
raise Exception(
''Syntax error in source at offset %s'' %
str( self.offset ) )

def __str__( self ):
return "\n".join(
[ "[L:%s]\t[O:%s]\t[%s]\t''%s''" %
( str( line ), str( match.pos ), name, match.group(0) )
for name, match, line in self.result ] )

# Test Example

source = r"""
Name: "Thomas", # just a comment
Age: 37
"""

tokens = {
''T_IDENTIFIER'' : r''[A-Za-z_][A-Za-z0-9_]*'',
''T_NUMBER'' : r''[+-]?\d+'',
''T_STRING'' : r''"(?:\\.|[^\\"])*"'',
''T_OPERATOR'' : r''[=:,;]'',
''T_NEWLINE'' : r''\n'',
''T_LWSP'' : r''[ \t]+'',
''T_COMMENT'' : r''(?:\#|//).*$'' }

print Lexer( source, tokens )

### End ###
Greetings,
Thomas

--
Ce n''est pas parce qu''ils sont nombreux à avoir tort qu''ils ont raison!
(Coluche)

解决方案

'' }

print Lexer( source, tokens )

### End ###
Greetings,
Thomas

--
Ce n''est pas parce qu''ils sont nombreux à avoir tort qu''ils ont raison!
(Coluche)

Thomas Mlynarczyk <th****@mlynarczyk-webdesign.dewrites:

Hello,

I started to write a lexer in Python -- my first attempt to do
something useful with Python (rather than trying out snippets from
tutorials). It is not complete yet, but I would like some feedback --
I''m a Python newbie and it seems that, with Python, there is always a
simpler and better way to do it than you think.

Hi,

Adding to John''s comments, I wouldn''t have source as a member of the
Lexer object but as an argument of the tokenise() method (which I would
make public). The tokenise method would return what you currently call
self.result. So it would be used like this.

>>mylexer = Lexer(tokens)
mylexer.tokenise(source)

# Later:

>>mylexer.tokenise(another_source)

--
Arnaud

Arnaud Delobelle schrieb:

Adding to John''s comments, I wouldn''t have source as a member of the
Lexer object but as an argument of the tokenise() method (which I would
make public). The tokenise method would return what you currently call
self.result. So it would be used like this.

>>>mylexer = Lexer(tokens)
mylexer.tokenise(source)
mylexer.tokenise(another_source)

At a later stage, I intend to have the source tokenised not all at once,
but token by token, "just in time" when the parser (yet to be written)
accesses the next token:

token = mylexer.next( ''FOO_TOKEN'' )
if not token: raise Exception( ''FOO token expected.'' )
# continue doing something useful with token

Where next() would return the next token (and advance an internal
pointer) *if* it is a FOO_TOKEN, otherwise it would return False. This
way, the total number of regex matchings would be reduced: Only that
which is expected is "tried out".

But otherwise, upon reflection, I think you are right and it would
indeed be more appropriate to do as you suggest.

Thanks for your feedback.

Greetings,
Thomas

--
Ce n''est pas parce qu''ils sont nombreux à avoir tort qu''ils ont raison!
(Coluche)

这篇关于我的第一个Python程序 - 词法分析器的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

我的第一个Python程序 - 词法分析器 [英] My first Python program -- a lexer

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

我的第一个Python程序 - 词法分析器 [英] My first Python program -- a lexer

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭