识别隐式字符串文字连接 [英] Identifying implicit string literal concatenation
问题描述
根据 guido(以及其他一些 Python 程序员),隐式字符串文字连接被认为是有害的.因此,我试图识别包含这种串联的逻辑行.
According to guido (and to some other Python programmers), implicit string literal concatenation is considered harmful. Thus, I am trying to identifying logical lines containing such a concatenation.
我的第一次(也是唯一一次)尝试是使用 shlex
;我想用 posix=False
分割一条逻辑线,所以我将识别用引号封装的部分,如果它们彼此相邻,则将其视为文字串联".
My first (and only) attempt was using shlex
; I thought of splitting a logical line with posix=False
, so I'll identify parts encapsulated by quotes, and if these lie next to each other, it will be considered "literal concatenation".
但是,这在多行字符串上失败,如下例所示:
However, this fails on multiline strings, as the following example shows:
shlex.split('""" Some docstring """', posix=False)
# Returns '['""', '" Some docstring "', '""']', which is considered harmful, but it's not
我可以调整这是一些奇怪的临时方式,但我想知道您是否能想到一个简单的解决方案.我的目的是将它添加到我已经扩展的 pep8
验证器中.
I can tweak this is some weird ad-hoc ways, but I wondered whether you can think of a simple solution for this. My intention is to add it to my already extended pep8
verifier.
推荐答案
有趣的问题,我只是想玩玩它,因为没有答案,我发布了我对问题的解决方案:
Interesting question, I just had to play with it and because there is no answer I'm posting my solution to the problem:
#!/usr/bin/python
import tokenize
import token
import sys
with open(sys.argv[1], 'rU') as f:
toks = list(tokenize.generate_tokens(f.readline))
for i in xrange(len(toks) - 1):
tok = toks[i]
# print tok
tok2 = toks[i + 1]
if tok[0] == token.STRING and tok[0] == tok2[0]:
print "implicit concatenation in line " \
"{} between {} and {}".format(tok[2][0], tok[1], tok2[1])
你可以用它自己喂程序,结果应该是
You can feed the program with itself and the result should be
implicit concatenation in line 14 between "implicit concatenation in line " and "{} between {} and {}"
这篇关于识别隐式字符串文字连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!