python,windows:用shlex解析命令行 [英] python, windows : parsing command lines with shlex

查看:820
本文介绍了python,windows:用shlex解析命令行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当您必须拆分命令行时,例如调用popen,最佳做法似乎是



subprocess.Popen shlex.split(cmd),...



但RTFM


shlex类很容易为类似于Unix shell的简单语法编写词法分析器...



最好的问候,Massimo

和什么关于报价解析和POSIX VS非POSIX模式? >解决方案

到目前为止,在Python stdlib for Windows / multi-platform目前还没有有效的命令行拆分功能(2016年3月)



subprocess



因此,对于 subprocess.Popen .call >

  if sys.platform =='win32':
args = cmd
else :
args = shlex.split(cmd)
subprocess.Popen(args,...)


b $ b

对于 shell 选项,在Windows上不需要拆分,而内部Popen只是使用



使用选项 shell = True

code>在UNIX上



在Windows上不需要 shlex.split 启动 .bat .cmd 脚本(不像.exe .com),你需要明确包括文件扩展名 - 除非 shell = True



命令行拆分注意事项:



shlex.split(cmd,posix = 0)在Windows路径中保留反斜杠,但不理解引号&逃脱权。它不是很清楚什么posix = 0模式的shlex是有益的 - 但99%它肯定引诱Windows /跨平台程序员...



Windows API公开 ctypes.windll.shell32.CommandLineToArgvW


解析Unicode命令行字符串并且以类似于标准C运行时argv和argc的方式向命令行参数返回指针数组
以及这些参数的计数

的值。




  def win_CommandLineToArgvW ):
import ctypes
nargs = ctypes.c_int()
ctypes.windll.shell32.CommandLineToArgvW.restype = ctypes.POINTER(ctypes.c_wchar_p)
lpargs = ctypes.windll .shell32.CommandLineToArgvW(unicode(cmd),ctypes.byref(nargs))
args = [lpargs [i] for i in range(nargs.value)]
if ctypes.windll.kernel32.LocalFree (lpargs):
raise AssertionError
return args

c $ c> CommandLineToArgvW 是伪造的或非常相似强制标准C argv& argc 解析:

 >> win_CommandLineToArgvW('aaabbbccc')
[u'aaabbb,u'ccc']
>>> win_CommandLineToArgvW('aaabbbccc')
[u'',u'aaabbbccc']
>>>



  C:\scratch> python -cimport sys; print sys.argv)aaabbbccc 
['-c','aaabbb'','ccc']

C:\scratch> python -cimport sys; print(sys.argv)aaabbbccc
['-c','','aaabbb'','ccc']






观看 http://bugs.python.org/issue1724822 可能会在Python库中添加(fisheye3服务器上提到的函数并不能正确工作。)






跨平台候选函数



有效的Windows命令行拆分相当疯狂。例如尝试 \ \\ \\\\\\aaa ...



我当前用于跨平台命令行拆分的候选函数是以下函数,我考虑用于Python lib。其多平台;它的速度比shlex快10倍,这是单脚步进和串流;并且还遵守管道相关字符(不像shlex)。它是一个已经在Windows和Mac上的强大的真实shell测试的列表。 Linux bash,以及 test_shlex 的旧posix测试模式。
对剩余错误的反馈感兴趣。

  def cmdline_split(s,platform =这个'):
用于命令行拆分的多平台变体shlex.split()
用于子流程,argv注入等使用快速REGEX

platform:'this'=从当前平台自动;
1 = POSIX;
0 = Windows / CMD
(保留其他值)

if platform =='this':
platform =(sys.platform!='win32')
if platform == 1:
RE_CMD_LEX = r''' \\ [\\] | [^])*)|'([^'] *)'|(\\。)|(& \\ |?| \ d?\> | [<])|([^ \s'\\& | &&>] +)|(\s +)| )'''
elif platform == 0:
RE_CMD_LEX = r'''((?:| \\ [\\] | [^]) )?()|(\\\\(?= \\ *)| \\)|(&&?| \ | \ |?| \ (b):
else:$ b $ ... $ ... $ ... $ ... $ ... $ ... $ ... $ ... b raise AssertionError('unkown platform%r'%platform)

args = []
accu =无#收集一个arg的部分
for qs,qss,esc,pipe ,word,white,fail in re.findall(RE_CMD_LEX,s):
if word:
pass#most frequent
elif esc:
word = esc [1]
elif white或pipe:
如果accu不是无:
args.append(accu)
如果pipe:
args.append(pipe)
accu =无
continue
elif失败:
raise ValueError(无效或不完整的shell字符串)
elif qs:
word = qs.replace('\\ ',''').replace('\\\\','\\')
if platform == 0:
word = word.replace ','')
else:
word = qss#甚至可以为空;必须是最后

accu =(accu或'')+字

如果accu不是无:
args.append(accu)

return args


when you have to split a command-line, for example to call popen, the best practice seems to be

subprocess.Popen(shlex.split(cmd), ...

but RTFM

The shlex class makes it easy to write lexical analyzers for simple syntaxes resembling that of the Unix shell ...

So, what's the correct way on win32 ? and what about quote parsing and POSIX vs non-POSIX mode ? best regards, Massimo

解决方案

There is no valid command-line splitting function so far in the Python stdlib for Windows/multi-platform so far. (Mar 2016)

subprocess

So in short for subprocess.Popen .call etc. best do like:

if sys.platform == 'win32':
    args = cmd
else:
    args = shlex.split(cmd)
subprocess.Popen(args, ...)

On Windows the split is not necessary for either values of shell option and internally Popen just uses subprocess.list2cmdline to again re-join the split arguments :-) .

With option shell=True the shlex.split is not necessary on Unix either.

Split or not, on Windows for starting .bat or .cmd scripts (unlike .exe .com) you need to include the file extension explicitely - unless shell=True.

Notes on command-line splitting nonetheless:

shlex.split(cmd, posix=0) retains backslashes in Windows paths, but it doesn't understand quoting & escaping right. Its not very clear what the posix=0 mode of shlex is good for at all - but 99% it certainly seduces Windows/cross-platform programmers ...

Windows API exposes ctypes.windll.shell32.CommandLineToArgvW:

Parses a Unicode command line string and returns an array of pointers to the command line arguments, along with a count of such arguments, in a way that is similar to the standard C run-time argv and argc values.

def win_CommandLineToArgvW(cmd):
    import ctypes
    nargs = ctypes.c_int()
    ctypes.windll.shell32.CommandLineToArgvW.restype = ctypes.POINTER(ctypes.c_wchar_p)
    lpargs = ctypes.windll.shell32.CommandLineToArgvW(unicode(cmd), ctypes.byref(nargs))
    args = [lpargs[i] for i in range(nargs.value)]
    if ctypes.windll.kernel32.LocalFree(lpargs):
        raise AssertionError
    return args

However that function CommandLineToArgvW is bogus - or just weakly similar to the mandatory standard C argv & argc parsing:

>>> win_CommandLineToArgvW('aaa"bbb""" ccc')
[u'aaa"bbb"""', u'ccc']
>>> win_CommandLineToArgvW('""  aaa"bbb""" ccc')
[u'', u'aaabbb" ccc']
>>> 

C:\scratch>python -c "import sys;print(sys.argv)" aaa"bbb""" ccc
['-c', 'aaabbb"', 'ccc']

C:\scratch>python -c "import sys;print(sys.argv)" ""  aaa"bbb""" ccc
['-c', '', 'aaabbb"', 'ccc']


Watch http://bugs.python.org/issue1724822 for possibly future additions in the Python lib. (The mentioned function on "fisheye3" server doesn't really work correct.)


Cross-platform candidate function

Valid Windows command-line splitting is rather crazy. E.g. try \ \\ \" \\"" \\\"aaa """" ...

My current candidate function for cross-platform command-line splitting is the following function which I consider for Python lib proposal. Its multi-platform; its ~10x faster than shlex, which does single-char stepping and streaming; and also respects pipe-related characters (unlike shlex). It stands a list of tough real-shell-tests already on Windows & Linux bash, plus the legacy posix test patterns of test_shlex. Interested in feedback about remaining bugs.

def cmdline_split(s, platform='this'):
    """Multi-platform variant of shlex.split() for command-line splitting.
    For use with subprocess, for argv injection etc. Using fast REGEX.

    platform: 'this' = auto from current platform;
              1 = POSIX; 
              0 = Windows/CMD
              (other values reserved)
    """
    if platform == 'this':
        platform = (sys.platform != 'win32')
    if platform == 1:
        RE_CMD_LEX = r'''"((?:\\["\\]|[^"])*)"|'([^']*)'|(\\.)|(&&?|\|\|?|\d?\>|[<])|([^\s'"\\&|<>]+)|(\s+)|(.)'''
    elif platform == 0:
        RE_CMD_LEX = r'''"((?:""|\\["\\]|[^"])*)"?()|(\\\\(?=\\*")|\\")|(&&?|\|\|?|\d?>|[<])|([^\s"&|<>]+)|(\s+)|(.)'''
    else:
        raise AssertionError('unkown platform %r' % platform)

    args = []
    accu = None   # collects pieces of one arg
    for qs, qss, esc, pipe, word, white, fail in re.findall(RE_CMD_LEX, s):
        if word:
            pass   # most frequent
        elif esc:
            word = esc[1]
        elif white or pipe:
            if accu is not None:
                args.append(accu)
            if pipe:
                args.append(pipe)
            accu = None
            continue
        elif fail:
            raise ValueError("invalid or incomplete shell string")
        elif qs:
            word = qs.replace('\\"', '"').replace('\\\\', '\\')
            if platform == 0:
                word = word.replace('""', '"')
        else:
            word = qss   # may be even empty; must be last

        accu = (accu or '') + word

    if accu is not None:
        args.append(accu)

    return args

这篇关于python,windows:用shlex解析命令行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆