在列表项错误之间发出缺少逗号的警告 [英] Issue warning for missing comma between list items bug

查看:100
本文介绍了在列表项错误之间发出缺少逗号的警告的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

故事:

在多行中定义一个字符串列表时,通常很容易在列表项之间忘记逗号,例如在本示例中:

test = [
    "item1"
    "item2"
]

列表test现在将只有一个项目"item1item2".

在重新排列列表中的项目之后,经常会出现问题.

示例堆栈溢出问题有此问题:

问题:

是否有一种方法,最好使用静态代码分析,在这种情况下发出警告,以便尽早发现问题?

解决方案

这些只是可能的解决方案,因为我不太适合静态分析..

使用tokenize:

我最近在使用标记化的python代码和我相信,当添加足够的逻辑时,它具有执行此类检查所需的所有信息.对于给定的列表,使用python -m tokenize list1.py生成的令牌如下:

python -m tokenize list1.py 

1,0-1,4:    NAME    'test'
1,5-1,6:    OP  '='
1,7-1,8:    OP  '['
1,8-1,9:    NL  '\n'
2,1-2,8:    STRING  '"item1"'
2,8-2,9:    NL  '\n'
3,1-3,8:    STRING  '"item2"'
3,8-3,9:    NL  '\n'
4,0-4,1:    OP  ']'
4,1-4,2:    NEWLINE '\n'
5,0-5,0:    ENDMARKER   ''

这当然是"问题"情况,其中的内容将被连接在一起.在存在,的情况下,输出会略有变化以反映这一点(我仅为列表主体添加了标记):

1,7-1,8:    OP  '['
1,8-1,9:    NL  '\n'
2,1-2,8:    STRING  '"item1"'
2,8-2,9:    OP  ','
2,9-2,10:   NL  '\n'
3,1-3,8:    STRING  '"item2"'
3,8-3,9:    NL  '\n'
4,0-4,1:    OP  ']'

现在,我们有了额外的OP ','令牌,表示存在用逗号分隔的第二个元素.

鉴于此信息,我们可以在tokenize模块中使用非常方便的方法generate_tokens. Py3中的方法 tokenize.generate_tokens() tokenize.tokenize() ,只有一个参数readline,这是对文件状对象的一种方法,该方法实际上返回该文件的下一行对象(>令牌.


使用parserast:

另一种可能更乏味的解决方案可能涉及 parser ast 模块.字符串的连接实际上是在创建抽象语法树的过程中执行的,因此您可以在那儿进行检测.

我真的不想转储我要提到的parserast方法的完整输出,但是,只是为了确保我们位于同一页面上,将使用以下列表初始化语句:

l_init = """
test = [
    "item1"
    "item2",
    "item3"
]
"""

为了生成解析树,请使用 p = parser.suite(l_init) .完成此操作后,您可以使用 p.tolist() (输出太大,无法添加).您会注意到,三个不同的str对象item1item2item3 将有三个条目.

另一方面,使用 ,并使用 ast.dump(node) 只有两个条目:一个用于级联的str s item1item2,另一个用于其他条目item3.

因此,这是另一种可能的实现方式,但是,正如我之前提到的那样,这是更加乏味的方式.我不确定线路信息是否可用,您是否处理两个不同的模块.如果您想在编译器链中使用更高级别的内部对象,请回想一下.


结束语:作为结束语,在这种情况下,tokenize方法似乎是最合乎逻辑的方法.相反,看来pylint实际上可以与 astroid 一起使用,从而简化分析的Python库python代码的抽象语法树集.因此,理想情况下应该查看它以及如何使用它在pintint内.

注意:当然,我可能会过度分析它,并且按照你们的建议,可以使用一种更简单的检查空白或换行符"解决方案. :-)

The Story:

When a list of strings is defined on multiple lines, it is often easy to forget a comma between list items, like in this example case:

test = [
    "item1"
    "item2"
]

The list test would now have a single item "item1item2".

Quite often the problem appears after rearranging the items in a list.

Sample Stack Overflow questions having this issue:

The Question:

Is there a way to, preferably using static code analysis, issue a warning in cases like this in order to spot the problem as early as possible?

解决方案

These are merely probable solutions since I'm not really apt with static-analysis.

With tokenize:

I recently fiddled around with tokenizing python code and I believe it has all the information needed to perform these kind of checks when sufficient logic is added. For your given list, the tokens generated with python -m tokenize list1.py are as follows:

python -m tokenize list1.py 

1,0-1,4:    NAME    'test'
1,5-1,6:    OP  '='
1,7-1,8:    OP  '['
1,8-1,9:    NL  '\n'
2,1-2,8:    STRING  '"item1"'
2,8-2,9:    NL  '\n'
3,1-3,8:    STRING  '"item2"'
3,8-3,9:    NL  '\n'
4,0-4,1:    OP  ']'
4,1-4,2:    NEWLINE '\n'
5,0-5,0:    ENDMARKER   ''

This of course is the 'problematic' case where the contents are going to get concatenated. In the case where a , is present, the output slightly changes to reflect this (I added the tokens only for the list body):

1,7-1,8:    OP  '['
1,8-1,9:    NL  '\n'
2,1-2,8:    STRING  '"item1"'
2,8-2,9:    OP  ','
2,9-2,10:   NL  '\n'
3,1-3,8:    STRING  '"item2"'
3,8-3,9:    NL  '\n'
4,0-4,1:    OP  ']'

Now we have the additional OP ',' token signifying the presence of a second element seperated by comma.

Given this information, we could use the really handy method generate_tokens in the tokenize module. Method tokenize.generate_tokens() , tokenize.tokenize() in Py3, has a single argument readline, a method on file-like objects which essentially returns the next line for that file like object (relevant answer). It returns a named tuple with 5 elements in total with information about the token type, the token string along with line number and position in the line.

Using this information, one could theoretically loop through a file and when an OP ',' is absent inside a list initialization (whose beginning is detected by checking that the tokens NAME, OP '=' and OP '[' exist on the same line number) one can issue a warning on the lines on which it was detected.

The good thing about this approach is that it is pretty straight-forward to generalize. To fit all cases where string literal concatenation takes place (namely, inside the 'grouping' operators (), {}, [] ) you check that the token is of type = 51 (or 53 for Python 3) or that a value in any of (, [, { exists on the same line (these are coarse, top of the head suggestions atm).

Now, I'm not really sure how other people go about with these sort of problems but it seems like it could be something you can look into. All the information necessary is offered by tokenize, the logic to detect it is the only thing missing.

Implementation Note: These values (for example, for type) do differ between versions and are subject to change so it is something one should be aware of. One could possibly leverage this by only working with constants for the tokens, though.


With parser and ast:

Another probable solution which is probably more tedious could involve the parser and ast modules. The concatenation of strings is actually performed during the creation of the Abstract Syntax Tree so you could alternatively detect it over there.

I don't really want to dump the full output of the methods for parser and ast that I'm going to mention, but, just to make sure we're on the same page, I'm going to be using the following list initialization statement:

l_init = """
test = [
    "item1"
    "item2",
    "item3"
]
"""

In order to get the parse tree generated, use p = parser.suite(l_init). After this is done, you can get a view of it with p.tolist() (output is too large to add it). What you notice is that there will be three entries for the three different str objects item1, item2, item3.

On the other hand, when the AST is created with node = ast.parse(l_init) and viewed with ast.dump(node) there are only two entries: one for the concatenated strs item1item2 and one for the other entry item3.

So, this is another probable way to do it but, as I previously mentioned, it is way more tedious. I'm not sure if line information is available and you deal with two different modules. Just have it as a back thought if you maybe want to play around with internal objects higher in the compiler chain.


Closing Comments: As a closing note, the tokenize approach seems to be most logical one in this case. On the contrary though, it seems that pylint actually works with astroid a python lib that eases analysis of Abstract Syntax Trees for python code. So, one should ideally look at it and how it is used inside pylint.

Note: Of course, I might be completely over-analyzing it and a simpler 'check for white-space or newline' solution as you guys suggested would suffice. :-)

这篇关于在列表项错误之间发出缺少逗号的警告的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆