sh和ksh之间的管道行为不同 [英] Different pipeline behavior between sh and ksh

查看:122
本文介绍了sh和ksh之间的管道行为不同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已将问题隔离到以下代码段中:

I have isolated the problem to the below code snippet:

  1. 以下说明,使用ksh运行脚本时,会将空字符串分配给LATEST_FILE_NAME='';但是使用sh运行时,脚本会将值正确分配给变量$LATEST_FILE_NAME.反过来,这会影响$FILE_LIST_COUNT的值.
  2. 但是由于脚本位于KornShell(ksh)中,所以我不确定是什么引起了该问题.
  3. 当我注释掉下面一行中的tee命令时,ksh脚本可以正常工作,并且可以将值正确分配给变量$LATEST_FILE_NAME.
  1. Notice below that null string gets assigned to LATEST_FILE_NAME='' when the script is run using ksh; but the script assigns the value to variable $LATEST_FILE_NAME correctly when run using sh. This in turn affects the value of $FILE_LIST_COUNT.
  2. But as the script is in KornShell (ksh), I am not sure what might be causing the issue.
  3. When I comment out the tee command in the below line, the ksh script works fine and correctly assigns the value to variable $LATEST_FILE_NAME.

(cd $SOURCE_FILE_PATH; ls *.txt 2>/dev/null) | sort -r > ${SOURCE_FILE_PATH}/${FILE_LIST} | tee -a $LOG_FILE_PATH


请考虑:


Kindly consider:

1.源代码:script.sh

#!/usr/bin/ksh
set -vx # Enable debugging

SCRIPTLOGSDIR=/some/path/Scripts/TEST/shell_issue
SOURCE_FILE_PATH=/some/path/Scripts/TEST/shell_issue
# Log file
Timestamp=`date +%Y%m%d%H%M`
LOG_FILENAME="TEST_LOGS_${Timestamp}.log"
LOG_FILE_PATH="${SCRIPTLOGSDIR}/${LOG_FILENAME}"
## Temporary files
FILE_LIST=FILE_LIST.temp    #Will store all  extract filenames
FILE_LIST_COUNT=0           # Stores total number of  files

getFileListDetails(){
    rm -f $SOURCE_FILE_PATH/$FILE_LIST 2>&1 | tee -a $LOG_FILE_PATH

    # Get list of all files, Sort in reverse order, and store names of the  files line-wise. If no files are found, error is muted.
    (cd $SOURCE_FILE_PATH; ls *.txt 2>/dev/null) | sort -r > ${SOURCE_FILE_PATH}/${FILE_LIST} | tee -a $LOG_FILE_PATH

    if [[ ! -f $SOURCE_FILE_PATH/$FILE_LIST ]]; then
        echo "FATAL ERROR - Could not create a temp file for  file list.";exit 1;
    fi

    LATEST_FILE_NAME="$(cd $SOURCE_FILE_PATH; head -1 $FILE_LIST)";
    FILE_LIST_COUNT="$(cat $SOURCE_FILE_PATH/$FILE_LIST | wc -l)";

}

getFileListDetails;
exit 0;


2.使用shell时的输出 sh script.sh:

+ getFileListDetails
+ rm -f /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
+ tee -a /some/path/Scripts/TEST/shell_issue/TEST_LOGS_201304300506.log
+ cd /some/path/Scripts/TEST/shell_issue
+ sort -r
+ tee -a /some/path/Scripts/TEST/shell_issue/TEST_LOGS_201304300506.log
+ ls 1.txt 2.txt 3.txt
+ [[ ! -f /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp ]]
cd $SOURCE_FILE_PATH; head -1 $FILE_LIST
++ cd /some/path/Scripts/TEST/shell_issue
++ head -1 FILE_LIST.temp
+ LATEST_FILE_NAME=3.txt
cat $SOURCE_FILE_PATH/$FILE_LIST | wc -l
++ cat /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
++ wc -l
+ FILE_LIST_COUNT=3
exit 0;
+ exit 0


3.使用ksh时的输出 ksh script.sh:

+ getFileListDetails
+ tee -a /some/path/Scripts/TEST/shell_issue/TEST_LOGS_201304300507.log
+ rm -f /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
+ 2>& 1
+ tee -a /some/path/Scripts/TEST/shell_issue/TEST_LOGS_201304300507.log
+ sort -r
+ 1> /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
+ cd /some/path/Scripts/TEST/shell_issue
+ ls 1.txt 2.txt 3.txt
+ 2> /dev/null
+ [[ ! -f /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp ]]
+ cd /some/path/Scripts/TEST/shell_issue
+ head -1 FILE_LIST.temp
+ LATEST_FILE_NAME=''
+ wc -l
+ cat /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
+ FILE_LIST_COUNT=0
exit 0;+ exit 0

推荐答案

好的,这是一个棘手而又微妙的问题.答案在于管道的实现方式. POSIX 指出

OK, here goes...this is a tricky and subtle one. The answer lies in how pipelines are implemented. POSIX states that

如果管道不在后台(请参见异步列表),则外壳程序必须等待管道中指定的最后一条命令完成,并且还可以等待所有命令完成.)

If the pipeline is not in the background (see Asynchronous Lists), the shell shall wait for the last command specified in the pipeline to complete, and may also wait for all commands to complete.)

注意关键字"可以".许多shell以 all 命令需要完成的方式来实现此目的,例如请参见联机帮助页:

Notice the keyword may. Many shells implement this in a way that all commands need to complete, e.g. see the bash manpage:

shell等待管道中的所有命令终止,然后返回值.

The shell waits for all commands in the pipeline to terminate before returning a value.

请注意联机帮助页中的措辞:

Notice the wording in the ksh manpage:

除最后一条命令外,每个命令都作为单独的进程运行; Shell等待最后一条命令终止.

Each command, except possibly the last, is run as a separate process; the shell waits for the last command to terminate.

在您的示例中,最后一个命令是tee命令.由于没有在tee中输入任何内容,因为您之前在命令中将stdout重定向到了${SOURCE_FILE_PATH}/${FILE_LIST},所以它立即退出.简而言之,tee比以前的重定向要快,这意味着在您读取文件时,文件可能还没有完成写入.您可以通过在整个命令的末尾添加sleep来对此进行测试(这不是修复方法!):

In your example, the last command is the tee command. Since there is no input to tee because you redirect stdout to ${SOURCE_FILE_PATH}/${FILE_LIST} in the command before, it immediately exits. Oversimplified speaking, the tee is faster than the earlier redirection, which means that your file is probably not finished writing to by the time you are reading from it. You can test this (this is not a fix!) by adding a sleep at the end of the whole command:

$ ksh -c 'ls /tmp/* | sort -r > /tmp/foo.txt | tee /tmp/bar.txt; echo "[$(head -n 1 /tmp/foo.txt)]"'
[]

$ ksh -c 'ls /tmp/* | sort -r > /tmp/foo.txt | tee /tmp/bar.txt; sleep 0.1; echo "[$(head -n 1 /tmp/foo.txt)]"'
[/tmp/sess_vo93c7h7jp2a49tvmo7lbn6r63]

$ bash -c 'ls /tmp/* | sort -r > /tmp/foo.txt | tee /tmp/bar.txt; echo "[$(head -n 1 /tmp/foo.txt)]"'
[/tmp/sess_vo93c7h7jp2a49tvmo7lbn6r63]

话虽如此,这里还有一些其他需要考虑的事情:

That being said, here are a few other things to consider:

  1. 总是引用您的变量,尤其是在处理文件时,以避免出现乱码,单词拆分(如果您的路径包含空格)等问题.

  1. Always quote your variables, especially when dealing with files, to avoid problems with globbing, word splitting (if your path contains spaces) etc.:

do_something "${this_is_my_file}"

head -1已过时,请使用head -n 1

如果一行上只有一个命令,则结尾的分号;是多余的...只需跳过它

If you only have one command on a line, the ending semicolon ; is superfluous...just skip it

LATEST_FILE_NAME="$(cd $SOURCE_FILE_PATH; head -1 $FILE_LIST)"

无需先进入目录cd,只需将整个路径指定为head的参数即可:

No need to cd into the directory first, just specify the whole path as argument to head:

LATEST_FILE_NAME="$(head -n 1 "${SOURCE_FILE_PATH}/${FILE_LIST}")"

FILE_LIST_COUNT="$(cat $SOURCE_FILE_PATH/$FILE_LIST | wc -l)"

这被称为猫的无用使用,因为不需要cat- wc可以处理文件.您可能使用了它,因为wc -l myfile的输出包含文件名,但是您可以使用例如FILE_LIST_COUNT="$(wc -l < "${SOURCE_FILE_PATH}/${FILE_LIST}")"代替.

This is called Useless Use Of Cat because the cat is not needed - wc can deal with files. You probably used it because the output of wc -l myfile includes the filename, but you can use e.g. FILE_LIST_COUNT="$(wc -l < "${SOURCE_FILE_PATH}/${FILE_LIST}")" instead.

此外,您将要阅读为什么不应该解析ls(1)的输出如何从目录中获取最新(或最旧)文件?.

这篇关于sh和ksh之间的管道行为不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆