python,windows:用shlex解析命令行 [英] python, windows : parsing command lines with shlex
问题描述
当您必须拆分命令行时,例如调用popen,最佳做法似乎是
subprocess.Popen shlex.split(cmd),...
但RTFM
shlex类很容易为类似于Unix shell的简单语法编写词法分析器...
最好的问候,Massimo
到目前为止,在Python stdlib for Windows / multi-platform目前还没有有效的命令行拆分功能(2016年3月)
subprocess
因此,对于 subprocess.Popen .call
>
if sys.platform =='win32':
args = cmd
else :
args = shlex.split(cmd)
subprocess.Popen(args,...)
b $ b
对于 shell
选项,在Windows上不需要拆分,而内部Popen只是使用
使用选项 shell = True
在Windows上不需要 shlex.split
启动 .bat
或 .cmd
脚本(不像.exe .com),你需要明确包括文件扩展名 - 除非 shell = True
。
命令行拆分注意事项:
shlex.split(cmd,posix = 0)
在Windows路径中保留反斜杠,但不理解引号&逃脱权。它不是很清楚什么posix = 0模式的shlex是有益的 - 但99%它肯定引诱Windows /跨平台程序员...
Windows API公开 ctypes.windll.shell32.CommandLineToArgvW
:
解析Unicode命令行字符串并且以类似于标准C运行时argv和argc的方式向命令行参数返回指针数组
以及这些参数的计数
的值。
def win_CommandLineToArgvW ):
import ctypes
nargs = ctypes.c_int()
ctypes.windll.shell32.CommandLineToArgvW.restype = ctypes.POINTER(ctypes.c_wchar_p)
lpargs = ctypes.windll .shell32.CommandLineToArgvW(unicode(cmd),ctypes.byref(nargs))
args = [lpargs [i] for i in range(nargs.value)]
if ctypes.windll.kernel32.LocalFree (lpargs):
raise AssertionError
return args
c $ c> CommandLineToArgvW 是伪造的或非常相似强制标准C argv& argc
解析:
>> win_CommandLineToArgvW('aaabbbccc')
[u'aaabbb,u'ccc']
>>> win_CommandLineToArgvW('aaabbbccc')
[u'',u'aaabbbccc']
>>>
C:\scratch> python -cimport sys; print sys.argv)aaabbbccc
['-c','aaabbb'','ccc']
C:\scratch> python -cimport sys; print(sys.argv)aaabbbccc
['-c','','aaabbb'','ccc']
观看 http://bugs.python.org/issue1724822 可能会在Python库中添加(fisheye3服务器上提到的函数并不能正确工作。)
跨平台候选函数
有效的Windows命令行拆分相当疯狂。例如尝试
\ \\ \\\\\\aaa
...
我当前用于跨平台命令行拆分的候选函数是以下函数,我考虑用于Python lib。其多平台;它的速度比shlex快10倍,这是单脚步进和串流;并且还遵守管道相关字符(不像shlex)。它是一个已经在Windows和Mac上的强大的真实shell测试的列表。 Linux bash,以及
test_shlex
的旧posix测试模式。
对剩余错误的反馈感兴趣。def cmdline_split(s,platform =这个'):
用于命令行拆分的多平台变体shlex.split()
用于子流程,argv注入等使用快速REGEX
platform:'this'=从当前平台自动;
1 = POSIX;
0 = Windows / CMD
(保留其他值)
if platform =='this':
platform =(sys.platform!='win32')
if platform == 1:
RE_CMD_LEX = r''' \\ [\\] | [^])*)|'([^'] *)'|(\\。)|(& \\ |?| \ d?\> | [<])|([^ \s'\\& | &&>] +)|(\s +)| )'''
elif platform == 0:
RE_CMD_LEX = r'''((?:| \\ [\\] | [^]) )?()|(\\\\(?= \\ *)| \\)|(&&?| \ | \ |?| \ (b):
else:$ b $ ... $ ... $ ... $ ... $ ... $ ... $ ... $ ... b raise AssertionError('unkown platform%r'%platform)
args = []
accu =无#收集一个arg的部分
for qs,qss,esc,pipe ,word,white,fail in re.findall(RE_CMD_LEX,s):
if word:
pass#most frequent
elif esc:
word = esc [1]
elif white或pipe:
如果accu不是无:
args.append(accu)
如果pipe:
args.append(pipe)
accu =无
continue
elif失败:
raise ValueError(无效或不完整的shell字符串)
elif qs:
word = qs.replace('\\ ',''').replace('\\\\','\\')
if platform == 0:
word = word.replace ','')
else:
word = qss#甚至可以为空;必须是最后
accu =(accu或'')+字
如果accu不是无:
args.append(accu)
return args
when you have to split a command-line, for example to call popen, the best practice seems to be
subprocess.Popen(shlex.split(cmd), ...
but RTFM
The shlex class makes it easy to write lexical analyzers for simple syntaxes resembling that of the Unix shell ...
So, what's the correct way on win32 ? and what about quote parsing and POSIX vs non-POSIX mode ? best regards, Massimo
解决方案There is no valid command-line splitting function so far in the Python stdlib for Windows/multi-platform so far. (Mar 2016)
subprocess
So in short for
subprocess.Popen .call
etc. best do like:if sys.platform == 'win32': args = cmd else: args = shlex.split(cmd) subprocess.Popen(args, ...)
On Windows the split is not necessary for either values of
shell
option and internally Popen just usessubprocess.list2cmdline
to again re-join the split arguments :-) .With option
shell=True
theshlex.split
is not necessary on Unix either.Split or not, on Windows for starting
.bat
or.cmd
scripts (unlike .exe .com) you need to include the file extension explicitely - unlessshell=True
.Notes on command-line splitting nonetheless:
shlex.split(cmd, posix=0)
retains backslashes in Windows paths, but it doesn't understand quoting & escaping right. Its not very clear what the posix=0 mode of shlex is good for at all - but 99% it certainly seduces Windows/cross-platform programmers ...Windows API exposes
ctypes.windll.shell32.CommandLineToArgvW
:Parses a Unicode command line string and returns an array of pointers to the command line arguments, along with a count of such arguments, in a way that is similar to the standard C run-time argv and argc values.
def win_CommandLineToArgvW(cmd): import ctypes nargs = ctypes.c_int() ctypes.windll.shell32.CommandLineToArgvW.restype = ctypes.POINTER(ctypes.c_wchar_p) lpargs = ctypes.windll.shell32.CommandLineToArgvW(unicode(cmd), ctypes.byref(nargs)) args = [lpargs[i] for i in range(nargs.value)] if ctypes.windll.kernel32.LocalFree(lpargs): raise AssertionError return args
However that function
CommandLineToArgvW
is bogus - or just weakly similar to the mandatory standard Cargv & argc
parsing:>>> win_CommandLineToArgvW('aaa"bbb""" ccc') [u'aaa"bbb"""', u'ccc'] >>> win_CommandLineToArgvW('"" aaa"bbb""" ccc') [u'', u'aaabbb" ccc'] >>>
C:\scratch>python -c "import sys;print(sys.argv)" aaa"bbb""" ccc ['-c', 'aaabbb"', 'ccc'] C:\scratch>python -c "import sys;print(sys.argv)" "" aaa"bbb""" ccc ['-c', '', 'aaabbb"', 'ccc']
Watch http://bugs.python.org/issue1724822 for possibly future additions in the Python lib. (The mentioned function on "fisheye3" server doesn't really work correct.)
Cross-platform candidate function
Valid Windows command-line splitting is rather crazy. E.g. try
\ \\ \" \\"" \\\"aaa """"
...My current candidate function for cross-platform command-line splitting is the following function which I consider for Python lib proposal. Its multi-platform; its ~10x faster than shlex, which does single-char stepping and streaming; and also respects pipe-related characters (unlike shlex). It stands a list of tough real-shell-tests already on Windows & Linux bash, plus the legacy posix test patterns of
test_shlex
. Interested in feedback about remaining bugs.def cmdline_split(s, platform='this'): """Multi-platform variant of shlex.split() for command-line splitting. For use with subprocess, for argv injection etc. Using fast REGEX. platform: 'this' = auto from current platform; 1 = POSIX; 0 = Windows/CMD (other values reserved) """ if platform == 'this': platform = (sys.platform != 'win32') if platform == 1: RE_CMD_LEX = r'''"((?:\\["\\]|[^"])*)"|'([^']*)'|(\\.)|(&&?|\|\|?|\d?\>|[<])|([^\s'"\\&|<>]+)|(\s+)|(.)''' elif platform == 0: RE_CMD_LEX = r'''"((?:""|\\["\\]|[^"])*)"?()|(\\\\(?=\\*")|\\")|(&&?|\|\|?|\d?>|[<])|([^\s"&|<>]+)|(\s+)|(.)''' else: raise AssertionError('unkown platform %r' % platform) args = [] accu = None # collects pieces of one arg for qs, qss, esc, pipe, word, white, fail in re.findall(RE_CMD_LEX, s): if word: pass # most frequent elif esc: word = esc[1] elif white or pipe: if accu is not None: args.append(accu) if pipe: args.append(pipe) accu = None continue elif fail: raise ValueError("invalid or incomplete shell string") elif qs: word = qs.replace('\\"', '"').replace('\\\\', '\\') if platform == 0: word = word.replace('""', '"') else: word = qss # may be even empty; must be last accu = (accu or '') + word if accu is not None: args.append(accu) return args
这篇关于python,windows:用shlex解析命令行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!