为什么在管道中读取和写入同一文件会产生不可靠的结果? [英] Why does reading and writing to the same file in a pipeline produce unreliable results?
问题描述
我有一堆文件,其中包含许多空白行,并且想要删除任何重复的空白行以使读取文件更加容易.我写了以下脚本:
I have a bunch a files that contain many blank lines, and want to remove any repeated blank lines to make reading the files easier. I wrote the following script:
#!/bin/bash
for file in * ; do cat "$file" | sed 's/^ \+//' | cat -s > "$file" ; done
但是,这产生了非常不可靠的结果,大多数文件完全变为空,只有少数文件具有预期的结果.更重要的是,每次重试时,起作用的文件似乎都是随机变化的,因为每次运行都会对不同的文件进行正确的编辑.发生了什么事?
However, this had very unreliable results, with most files becoming completely empty and only a few files having the intended results. What's more, the files that did work seemed to change randomly every time I retried, as different files would get correctly edited in every run. What's going on?
注意:这更多是理论上的问题,因为我意识到我可以使用类似的解决方法:
Note: This is more of a theoretical question, because I realize I could use a workaround like:
#!/bin/bash
for file in * ; do
cat "$file" | sed 's/^ \+//' | cat -s > "$file"-tmp
rm "$file"
mv "$file"-tmp "$file"
done
但这似乎不必要地令人费解.那么,为什么直接"方法如此不可靠?
But that seems unnecessarily convoluted. So why is the "direct" method so unreliable?
推荐答案
发生不可预测性是因为在管道中的两个阶段cat "$file"
和cat -s > "$file"
之间存在竞争条件.
The unpredictability happens because there's a race condition between two stages in the pipeline, cat "$file"
and cat -s > "$file"
.
第一个尝试打开文件并从中读取,而另一个尝试清空文件.
The first tries to open the file and read from it, while the other tries to empty the file.
- 如果在读取之前将其清空,则会得到一个空文件.
- 如果在清空之前将其读取,您会得到一些数据(但是文件很快就会清空,除非结果很短,否则结果将被截断).
如果您具有GNU sed,则只需执行sed -i 'expression' *
If you have GNU sed, you can simply do sed -i 'expression' *
这篇关于为什么在管道中读取和写入同一文件会产生不可靠的结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!