文件不能提交作业的一小部分被发现 [英] File can't be found in a small fraction of submitted jobs
问题描述
我试图运行它采用了Lustre文件系统集群RHEL5一个非常大集的批处理作业。我得到一个奇怪的错误与工作大约1%:他们便无法发现它们都使用转向的文本文件。重现错误的脚本是这样的:
I'm trying to run a very large set of batch jobs on a RHEL5 cluster which uses a Lustre file system. I was getting a strange error with roughly 1% of the jobs: they could't find a text file they are all using for steering. A script that reproduces the error looks like this:
#!/usr/bin/env bash
#PBS -t 1-18792
#PBS -l mem=4gb,walltime=30:00
#PBS -l nodes=1:ppn=1
#PBS -q hep
#PBS -o output/fit/out.txt
#PBS -e output/fit/error.txt
cd $PBS_O_WORKDIR
mkdir -p output/fit
echo 'submitted from: ' $PBS_O_WORKDIR
files=($(ls ./*.txt | sort)) # <-- NOTE THIS LINE
cat batch/fits/fit-paths.txt
有关作业某些小部分,错误流输出会显示:
For some small fraction of jobs, the error stream output would show:
cat: batch/fits/fit-paths.txt: No such file or directory
够奇怪的,但它得到的陌生人。
Weird enough, but it gets stranger.
当我修改文件=($(LS ./*.txt |排序))
行
files=($(ls batch/fits/*.txt | sort))
运行作业没有错误!不用说,这远远不能满足:我宁愿没有我的工作依赖于魔法(虽然黑魔法的是的比的没有的魔术更好)。
The jobs run without errors! Needless to say, this is far from satisfying: I'd rather not have my jobs depend on black magic (although black magic is better than no magic).
任何想法是怎么回事?
Any idea what's going on here?
推荐答案
尝试更换
files=($(ls ./*.txt | sort))
与
files=(./*.txt)
通常,shell自动分拣水珠的结果, - 对比分析LS(1)输出,它不应该在可移植的shell脚本来完成 - 手柄正确引用的特殊字符
Normally, the shell automatically sorts glob results, and – in contrast to parsing ls(1) output, which should never be done in portable shell scripts – handles quoting of special characters correctly.
虽然这只是一个问题,如果你曾经有在他们特定的shell元字符的文件。这里的候选人是空格,制表符,换行符和可能回车。
Although this is only an issue if you ever have files with certain shell metacharacters in them. Candidates here are space, tab, newline and possibly carriage return.
这篇关于文件不能提交作业的一小部分被发现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!