sh和ksh之间的管道行为不同 [英] Different pipeline behavior between sh and ksh
问题描述
我已将问题隔离到以下代码段中:
I have isolated the problem to the below code snippet:
- 以下说明,使用
ksh
运行脚本时,会将空字符串分配给LATEST_FILE_NAME=''
;但是使用sh
运行时,脚本会将值正确分配给变量$LATEST_FILE_NAME
.反过来,这会影响$FILE_LIST_COUNT
的值. - 但是由于脚本位于KornShell(ksh)中,所以我不确定是什么引起了该问题.
- 当我注释掉下面一行中的
tee
命令时,ksh脚本可以正常工作,并且可以将值正确分配给变量$LATEST_FILE_NAME
.
- Notice below that null string gets assigned to
LATEST_FILE_NAME=''
when the script is run usingksh
; but the script assigns the value to variable$LATEST_FILE_NAME
correctly when run usingsh
. This in turn affects the value of$FILE_LIST_COUNT
. - But as the script is in KornShell (ksh), I am not sure what might be causing the issue.
- When I comment out the
tee
command in the below line, the ksh script works fine and correctly assigns the value to variable$LATEST_FILE_NAME
.
(cd $SOURCE_FILE_PATH; ls *.txt 2>/dev/null) | sort -r > ${SOURCE_FILE_PATH}/${FILE_LIST} | tee -a $LOG_FILE_PATH
请考虑:
Kindly consider:
1.源代码:script.sh
#!/usr/bin/ksh
set -vx # Enable debugging
SCRIPTLOGSDIR=/some/path/Scripts/TEST/shell_issue
SOURCE_FILE_PATH=/some/path/Scripts/TEST/shell_issue
# Log file
Timestamp=`date +%Y%m%d%H%M`
LOG_FILENAME="TEST_LOGS_${Timestamp}.log"
LOG_FILE_PATH="${SCRIPTLOGSDIR}/${LOG_FILENAME}"
## Temporary files
FILE_LIST=FILE_LIST.temp #Will store all extract filenames
FILE_LIST_COUNT=0 # Stores total number of files
getFileListDetails(){
rm -f $SOURCE_FILE_PATH/$FILE_LIST 2>&1 | tee -a $LOG_FILE_PATH
# Get list of all files, Sort in reverse order, and store names of the files line-wise. If no files are found, error is muted.
(cd $SOURCE_FILE_PATH; ls *.txt 2>/dev/null) | sort -r > ${SOURCE_FILE_PATH}/${FILE_LIST} | tee -a $LOG_FILE_PATH
if [[ ! -f $SOURCE_FILE_PATH/$FILE_LIST ]]; then
echo "FATAL ERROR - Could not create a temp file for file list.";exit 1;
fi
LATEST_FILE_NAME="$(cd $SOURCE_FILE_PATH; head -1 $FILE_LIST)";
FILE_LIST_COUNT="$(cat $SOURCE_FILE_PATH/$FILE_LIST | wc -l)";
}
getFileListDetails;
exit 0;
2.使用shell时的输出 sh script.sh
:
+ getFileListDetails
+ rm -f /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
+ tee -a /some/path/Scripts/TEST/shell_issue/TEST_LOGS_201304300506.log
+ cd /some/path/Scripts/TEST/shell_issue
+ sort -r
+ tee -a /some/path/Scripts/TEST/shell_issue/TEST_LOGS_201304300506.log
+ ls 1.txt 2.txt 3.txt
+ [[ ! -f /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp ]]
cd $SOURCE_FILE_PATH; head -1 $FILE_LIST
++ cd /some/path/Scripts/TEST/shell_issue
++ head -1 FILE_LIST.temp
+ LATEST_FILE_NAME=3.txt
cat $SOURCE_FILE_PATH/$FILE_LIST | wc -l
++ cat /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
++ wc -l
+ FILE_LIST_COUNT=3
exit 0;
+ exit 0
3.使用ksh时的输出 ksh script.sh
:
+ getFileListDetails
+ tee -a /some/path/Scripts/TEST/shell_issue/TEST_LOGS_201304300507.log
+ rm -f /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
+ 2>& 1
+ tee -a /some/path/Scripts/TEST/shell_issue/TEST_LOGS_201304300507.log
+ sort -r
+ 1> /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
+ cd /some/path/Scripts/TEST/shell_issue
+ ls 1.txt 2.txt 3.txt
+ 2> /dev/null
+ [[ ! -f /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp ]]
+ cd /some/path/Scripts/TEST/shell_issue
+ head -1 FILE_LIST.temp
+ LATEST_FILE_NAME=''
+ wc -l
+ cat /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
+ FILE_LIST_COUNT=0
exit 0;+ exit 0
推荐答案
好的,这是一个棘手而又微妙的问题.答案在于管道的实现方式. POSIX 指出
OK, here goes...this is a tricky and subtle one. The answer lies in how pipelines are implemented. POSIX states that
如果管道不在后台(请参见异步列表),则外壳程序必须等待管道中指定的最后一条命令完成,并且还可以等待所有命令完成.)
If the pipeline is not in the background (see Asynchronous Lists), the shell shall wait for the last command specified in the pipeline to complete, and may also wait for all commands to complete.)
注意关键字"可以".许多shell以 all 命令需要完成的方式来实现此目的,例如请参见 bash 联机帮助页:
Notice the keyword may. Many shells implement this in a way that all commands need to complete, e.g. see the bash manpage:
shell等待管道中的所有命令终止,然后返回值.
The shell waits for all commands in the pipeline to terminate before returning a value.
请注意 ksh 联机帮助页中的措辞:
Notice the wording in the ksh manpage:
除最后一条命令外,每个命令都作为单独的进程运行; Shell等待最后一条命令终止.
Each command, except possibly the last, is run as a separate process; the shell waits for the last command to terminate.
在您的示例中,最后一个命令是tee
命令.由于没有在tee
中输入任何内容,因为您之前在命令中将stdout
重定向到了${SOURCE_FILE_PATH}/${FILE_LIST}
,所以它立即退出.简而言之,tee
比以前的重定向要快,这意味着在您读取文件时,文件可能还没有完成写入.您可以通过在整个命令的末尾添加sleep
来对此进行测试(这不是修复方法!):
In your example, the last command is the tee
command. Since there is no input to tee
because you redirect stdout
to ${SOURCE_FILE_PATH}/${FILE_LIST}
in the command before, it immediately exits. Oversimplified speaking, the tee
is faster than the earlier redirection, which means that your file is probably not finished writing to by the time you are reading from it. You can test this (this is not a fix!) by adding a sleep
at the end of the whole command:
$ ksh -c 'ls /tmp/* | sort -r > /tmp/foo.txt | tee /tmp/bar.txt; echo "[$(head -n 1 /tmp/foo.txt)]"'
[]
$ ksh -c 'ls /tmp/* | sort -r > /tmp/foo.txt | tee /tmp/bar.txt; sleep 0.1; echo "[$(head -n 1 /tmp/foo.txt)]"'
[/tmp/sess_vo93c7h7jp2a49tvmo7lbn6r63]
$ bash -c 'ls /tmp/* | sort -r > /tmp/foo.txt | tee /tmp/bar.txt; echo "[$(head -n 1 /tmp/foo.txt)]"'
[/tmp/sess_vo93c7h7jp2a49tvmo7lbn6r63]
话虽如此,这里还有一些其他需要考虑的事情:
That being said, here are a few other things to consider:
-
总是引用您的变量,尤其是在处理文件时,以避免出现乱码,单词拆分(如果您的路径包含空格)等问题.
Always quote your variables, especially when dealing with files, to avoid problems with globbing, word splitting (if your path contains spaces) etc.:
do_something "${this_is_my_file}"
head -1
已过时,请使用head -n 1
如果一行上只有一个命令,则结尾的分号;
是多余的...只需跳过它
If you only have one command on a line, the ending semicolon ;
is superfluous...just skip it
LATEST_FILE_NAME="$(cd $SOURCE_FILE_PATH; head -1 $FILE_LIST)"
无需先进入目录cd
,只需将整个路径指定为head
的参数即可:
No need to cd
into the directory first, just specify the whole path as argument to head
:
LATEST_FILE_NAME="$(head -n 1 "${SOURCE_FILE_PATH}/${FILE_LIST}")"
FILE_LIST_COUNT="$(cat $SOURCE_FILE_PATH/$FILE_LIST | wc -l)"
这被称为猫的无用使用,因为不需要cat
- wc
可以处理文件.您可能使用了它,因为wc -l myfile
的输出包含文件名,但是您可以使用例如FILE_LIST_COUNT="$(wc -l < "${SOURCE_FILE_PATH}/${FILE_LIST}")"
代替.
This is called Useless Use Of Cat because the cat
is not needed - wc
can deal with files. You probably used it because the output of wc -l myfile
includes the filename, but you can use e.g. FILE_LIST_COUNT="$(wc -l < "${SOURCE_FILE_PATH}/${FILE_LIST}")"
instead.
此外,您将要阅读为什么不应该解析ls(1)的输出和如何从目录中获取最新(或最旧)文件?.
这篇关于sh和ksh之间的管道行为不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!