在开发类似python的小型语言时进行缩进控制 [英] Indentation control while developing a small python like language

查看:71
本文介绍了在开发类似python的小型语言时进行缩进控制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用flex,byacc(用于词法和解析)和C ++开发一种类似python的小型语言,但是我对范围控制有一些疑问.

就像python一样,它使用空格(或制表符)进行缩进,不仅如此,而且我想实现索引中断,例如,如果在另一个while循环内的while循环内键入"break 2",则不仅从最后一个循环中断,但也从第一个循环中断(因此中断后为2),依此类推.

示例:

while 1
    while 1
        break 2
        'hello world'!! #will never reach this. "!!" outputs with a newline
    end
    'hello world again'!! #also will never reach this. again "!!" used for cout
end
#after break 2 it would jump right here

但是由于我没有"anti"制表符来检查作用域何时结束(例如,像C这样,我只会使用'}'char),所以我想知道这种方法是否最好:

我将在yacc文件中定义一个全局变量,例如"int tabIndex",我将使用extern在lex文件中访问它.那么每当我在lex文件中找到一个制表符时,当我在yacc文件上进行解析时,如果我找到一个"break"关键字,我都会将该变量减1.编译后达到EOF并得到tabIndex!= 0我将输出编译错误.

现在的问题是,查看缩进是否减少的最佳方法是什么,我应该从lex读取\ b(退格)字符,然后减少tabIndex变量(当用户不使用break时)吗?

实现此目标的另一种方法?

还有另一个小问题,我希望每个可执行文件在名为start()的函数上都有它的起点,是否应该将其硬编码到我的yacc文件中?

很抱歉,长期以来的任何帮助都将不胜感激.同样,如果有人可以为python提供yacc文件,那么作为一个准则(尝试在Google上浏览并且没有运气)会很好.

提前谢谢.

解决方案

我目前正在实现一种与此类似的编程语言(包括奇怪的多级中断).我的解决方案是让令牌生成器基于缩进来发出缩进和缩进令牌.例如:

while 1: # colons help :)
    print('foo')
    break 1

成为:

["while", "1", ":",
    indent,
    "print", "(", "'foo'", ")",
    "break", "1",
    dedent]

这使令牌化程序对'\ n'的处理有些复杂.另外,我从头开始编写了分词器和解析器,所以不确定在lex和yacc中是否可行.

半工作式伪代码示例:

level = 0
levels = []
for c = getc():
    if c=='\n':
        emit('\n')
        n = 0
        while (c=getc())==' ':
            n += 1
        if n > level:
            emit(indent)
            push(levels,n)
        while n < level:
            emit(dedent)
            level = pop(levels)
            if level < n:
                error tokenize
        # fall through
    emit(c) #lazy example

I'm developing a small python like language using flex, byacc (for lexical and parsing) and C++, but i have a few questions regarding scope control.

just as python it uses white spaces (or tabs) for indentation, not only that but i want to implement index breaking like for instance if you type "break 2" inside a while loop that's inside another while loop it would not only break from the last one but from the first loop as well (hence the number 2 after break) and so on.

example:

while 1
    while 1
        break 2
        'hello world'!! #will never reach this. "!!" outputs with a newline
    end
    'hello world again'!! #also will never reach this. again "!!" used for cout
end
#after break 2 it would jump right here

but since I don't have an "anti" tab character to check when a scope ends (like C for example i would just use the '}' char) i was wondering if this method would the the best:

I would define a global variable, like "int tabIndex" on my yacc file that i would access in my lex file using extern. then every time i find a tab character on my lex file i would increment that variable by 1. when parsing on my yacc file if i find a "break" keyword i would decrement by the amount typed after it from the tabIndex variable, and when i reach and EOF after compiling and i get a tabIndex != 0 i would output compilation error.

now the problem is, whats the best way to see if the indentation got reduced, should i read \b (backspace) chars from lex and then reduce the tabIndex variable (when the user doesn't use break)?

another method to achieve this?

also just another small question, i want every executable to have its starting point on the function called start() should i hardcode this onto my yacc file?

sorry for the long question any help is greatly appreciated. also if someone can provide an yacc file for python would be nice as a guideline (tried looking on Google and had no luck).

thanks in advance.

解决方案

I am currently implementing a programming language rather similar to this (including the multilevel break oddly enough). My solution was to have the tokenizer emit indent and dedent tokens based on indentation. Eg:

while 1: # colons help :)
    print('foo')
    break 1

becomes:

["while", "1", ":",
    indent,
    "print", "(", "'foo'", ")",
    "break", "1",
    dedent]

It makes the tokenizer's handling of '\n' somewhat complicated though. Also, i wrote the tokenizer and parser from scratch, so i'm not sure whether this is feasable in lex and yacc.

Edit:

Semi-working pseudocode example:

level = 0
levels = []
for c = getc():
    if c=='\n':
        emit('\n')
        n = 0
        while (c=getc())==' ':
            n += 1
        if n > level:
            emit(indent)
            push(levels,n)
        while n < level:
            emit(dedent)
            level = pop(levels)
            if level < n:
                error tokenize
        # fall through
    emit(c) #lazy example

这篇关于在开发类似python的小型语言时进行缩进控制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆