Python 'sys.argv' 是否限制在最大参数数量上? [英] Is Python 'sys.argv' limited in the maximum number of arguments?

查看:49
本文介绍了Python 'sys.argv' 是否限制在最大参数数量上?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个需要处理大量文件的 Python 脚本.为了绕过 Linux 对可以传递给命令的参数数量的相对较小的限制,我使用了 find -print0xargs -0.

I have a Python script that needs to process a large number of files. To get around Linux's relatively small limit on the number of arguments that can be passed to a command, I am using find -print0 with xargs -0.

我知道另一种选择是使用 Python 的 glob 模块,但是当我有更高级的 find 命令、查找修改时间等时,这将无济于事.

I know another option would be to use Python's glob module, but that won't help when I have a more advanced find command, looking for modification times, etc.

在大量文件上运行我的脚本时,Python 只接受参数的一个子集,我首先想到的限制是在 argparse 中,但似乎在 sys.argv 中.我找不到任何关于此的文档.是bug吗?

When running my script on a large number of files, Python only accepts a subset of the arguments, a limitation I first thought was in argparse, but appears to be in sys.argv. I can't find any documentation on this. Is it a bug?

以下是说明这一点的示例 Python 脚本:

Here's a sample Python script illustrating the point:

import argparse
import sys
import os

parser = argparse.ArgumentParser()
parser.add_argument('input_files', nargs='+')
args = parser.parse_args(sys.argv[1:])

print 'pid:', os.getpid(), 'argv files', len(sys.argv[1:]), 'argparse files:', len(args.input_files)

我有很多文件可以运行:

I have a lot of files to run this on:

$ find ~/ -name "*" -print0 | xargs -0 ls > filelist
748709 filelist

但看起来 xargs 或 Python 正在分块我的大文件列表并用几个不同的 Python 运行:

But it appears xargs or Python is chunking my big list of files and processing it with several different Python runs:

$ find ~/ -name "*" -print0 | xargs -0 python test.py
pid: 4216 argv files 1819 number of files: 1819
pid: 4217 argv files 1845 number of files: 1845
pid: 4218 argv files 1845 number of files: 1845
pid: 4219 argv files 1845 number of files: 1845
pid: 4220 argv files 1845 number of files: 1845
pid: 4221 argv files 1845 number of files: 1845
...

为什么要创建多个进程来处理列表?为什么它会被分块?我认为文件名中没有换行符,-print0-0 不应该解决这个问题吗?如果有换行符,我希望 sed -n '1810,1830p' filelist 显示上面例子的一些奇怪之处.什么给?

Why are multiple processes being created to process the list? Why is it being chunked at all? I don't think there are newlines in the file names and shouldn't -print0 and -0 take care of that issue? If there were newlines, I'd expect sed -n '1810,1830p' filelist to show some weirdness for the above example. What gives?

我差点忘了:

$ python -V
Python 2.7.2+

推荐答案

xargs 默认会分块你的参数.查看 xargs--max-args--max-chars 选项.它的手册页还解释了限制(在 --max-chars 下).

xargs will chunk your arguments by default. Have a look at the --max-args and --max-chars options of xargs. Its man page also explains the limits (under --max-chars).

这篇关于Python 'sys.argv' 是否限制在最大参数数量上?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆