Python'sys.argv'是否限制了最大数量的参数? [英] Is Python 'sys.argv' limited in the maximum number of arguments?
问题描述
我有一个Python脚本,需要处理大量文件.为了绕过Linux对可以传递给命令的参数数量相对较小的限制,我将find -print0
与xargs -0
结合使用.
I have a Python script that needs to process a large number of files. To get around Linux's relatively small limit on the number of arguments that can be passed to a command, I am using find -print0
with xargs -0
.
我知道另一种选择是使用Python的glob模块,但是当我有更高级的find
命令,寻找修改时间等时,这将无济于事.
I know another option would be to use Python's glob module, but that won't help when I have a more advanced find
command, looking for modification times, etc.
在大量文件上运行脚本时,Python仅接受参数的一个子集,这是我最初认为在argparse
中的限制,但似乎在sys.argv
中.我找不到与此有关的任何文档.是虫子吗?
When running my script on a large number of files, Python only accepts a subset of the arguments, a limitation I first thought was in argparse
, but appears to be in sys.argv
. I can't find any documentation on this. Is it a bug?
这是一个示例Python脚本,说明了这一点:
Here's a sample Python script illustrating the point:
import argparse
import sys
import os
parser = argparse.ArgumentParser()
parser.add_argument('input_files', nargs='+')
args = parser.parse_args(sys.argv[1:])
print 'pid:', os.getpid(), 'argv files', len(sys.argv[1:]), 'argparse files:', len(args.input_files)
我有很多文件可以在其上运行:
I have a lot of files to run this on:
$ find ~/ -name "*" -print0 | xargs -0 ls > filelist
748709 filelist
但是它似乎是 xargs 或Python正在分块我的大文件列表并用几个文件进行处理不同的Python运行:
But it appears xargs or Python is chunking my big list of files and processing it with several different Python runs:
$ find ~/ -name "*" -print0 | xargs -0 python test.py
pid: 4216 argv files 1819 number of files: 1819
pid: 4217 argv files 1845 number of files: 1845
pid: 4218 argv files 1845 number of files: 1845
pid: 4219 argv files 1845 number of files: 1845
pid: 4220 argv files 1845 number of files: 1845
pid: 4221 argv files 1845 number of files: 1845
...
为什么要创建多个进程来处理列表?为什么将其全部分割成块?我认为文件名中没有换行符,并且-print0
和-0
是否应该解决该问题?如果有换行符,我希望sed -n '1810,1830p' filelist
在上面的示例中显示出一些怪异之处.有什么作用?
Why are multiple processes being created to process the list? Why is it being chunked at all? I don't think there are newlines in the file names and shouldn't -print0
and -0
take care of that issue? If there were newlines, I'd expect sed -n '1810,1830p' filelist
to show some weirdness for the above example. What gives?
我差点忘了:
$ python -V
Python 2.7.2+
推荐答案
xargs
默认情况下会分块您的参数.看一下xargs
的--max-args
和--max-chars
选项.其手册页还说明了限制(在--max-chars
下).
xargs
will chunk your arguments by default. Have a look at the --max-args
and --max-chars
options of xargs
. Its man page also explains the limits (under --max-chars
).
这篇关于Python'sys.argv'是否限制了最大数量的参数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!