如何用目录的每个文件中的空白替换选项卡 [英] how to replace the tabs with empty space in each file of a directory
问题描述
我想用相应的空白区域替换目录中每个文件中的选项卡.我已经找到了一个解决方案 11094383,您可以在其中用给定空格数替换制表符:
I would like to replace the tabs in each file of a directory with the corresponding empty space. I found already a solution 11094383, where you can replace tabs with given number of empty spaces:
> find ./ -type f -exec sed -i 's/\t/ /g' {} \;
在上面的解决方案中,选项卡被替换为四个空格.但在我的情况下,选项卡可以占用更多空间 - 例如8.
In the solution above tabs are replaced with four spaces. But in my case tabs can occupy more spaces - e.g. 8.
带有制表符的文件示例,应替换为 8 个空格:
An example of file with tabs, which should be replaced with 8 spaces is:
NSMl1 100 PSHELL 0.00260 400000 400200 400300
400400 400500 400600 400700 400800 400900
401000 401100 400100 430000 430200 430300
430400 430500 430600 430700 430800 430900
431000 431100 430100 401200 431200
这里带有制表符的行是第 3 到第 5 行.
here the lines with tabs are the 3th to the 5th line.
带有标签的文件示例,应替换为 4 个标签:
An example of file with tabs, which should be replaced with 4 tabs is:
RBE2 1101001 5000511 123456 1100
有人可以帮忙吗?
推荐答案
经典的答案是使用带有选项的 pr
命令将制表符扩展到适当数量的空格,转换分页功能:
The classic answer is to use the pr
command with options to expand tabs into an appropriate number of spaces, turning of the pagination features:
pr -e8 -l1 -t …files…
棘手的部分是覆盖文件,这似乎是问题的一部分.当然,GNU 和 BSD (Mac OS X) 版本中的 sed
支持使用 -i
选项覆盖 — 两者之间的行为变体为 BSD sed
需要备份文件的后缀,而 GNU sed
不需要.但是,sed
不(现成)支持将制表符转换为适当数量的空格,因此并不完全合适.
The tricky part is getting the file over-written that seems to be part of the question. Of course, sed
in the GNU and BSD (Mac OS X) incarnations supports overwriting with the -i
option — with variant behaviours between the two as BSD sed
requires a suffix for the backup files and GNU sed
does not. However, sed
does not (readily) support converting tabs to an appropriate number of blanks, so it isn't wholly appropriate.
可以做到这一点的 UNIX 编程环境.我从 1987 年开始使用这个脚本(第一次签入 - 最后一次更新是在 2005 年).
There's a script overwrite
(which I abbreviate to ow
) in The UNIX Programming Environment that can do that. I've been using the script since 1987 (first checkin — last updated in 2005).
#!/bin/sh
# Overwrite file
# From: The UNIX Programming Environment by Kernighan and Pike
# Amended: remove PATH setting; handle file names with blanks.
case $# in
0|1) echo "Usage: $0 file command [arguments]" 1>&2
exit 1;;
esac
file="$1"
shift
new=${TMPDIR:-/tmp}/ovrwr.$$.1
old=${TMPDIR:-/tmp}/ovrwr.$$.2
trap "rm -f '$new' '$old' ; exit 1" 0 1 2 15
if "$@" >"$new"
then
cp "$file" "$old"
trap "" 1 2 15
cp "$new" "$file"
rm -f "$new" "$old"
trap 0
exit 0
else
echo "$0: $1 failed - $file unchanged" 1>&2
rm -f "$new" "$old"
trap 0
exit 1
fi
现在在大多数系统上使用 mktemp
命令是可能的,而且可以说更好;那时还不存在.
It would be possible and arguably better to use the mktemp
command on most systems these days; it didn't exist way back then.
在问题的上下文中,您可以使用:
In the context of the question, you could then use:
find . -type f -exec ow {} pr -e8 -t -l1 \;
您确实需要单独处理每个文件.
You do need to process each file separately.
如果您真的决心使用 sed
来完成这项工作,那么您的工作就被裁掉了.有一个可怕的方法来做到这一点.存在符号问题;如何表示文字制表符;我将使用 \t
来表示它.该脚本将存储在一个文件中,我假设该文件是 script.sed
:
If you are truly determined to use sed
for the job, then you have your work cut out. There's a gruesome way to do it. There is a notational problem; how to represent a literal tab; I will use \t
to denote it. The script would be stored in a file, which I'll assume is script.sed
:
:again
/^\(\([^\t]\{8\}\)*\)\t/s//\1 /
/^\(\([^\t]\{8\}\)*\)\([^\t]\{1\}\)\t/s//\1\3 /
/^\(\([^\t]\{8\}\)*\)\([^\t]\{2\}\)\t/s//\1\3 /
/^\(\([^\t]\{8\}\)*\)\([^\t]\{3\}\)\t/s//\1\3 /
/^\(\([^\t]\{8\}\)*\)\([^\t]\{4\}\)\t/s//\1\3 /
/^\(\([^\t]\{8\}\)*\)\([^\t]\{5\}\)\t/s//\1\3 /
/^\(\([^\t]\{8\}\)*\)\([^\t]\{6\}\)\t/s//\1\3 /
/^\(\([^\t]\{8\}\)*\)\([^\t]\{7\}\)\t/s//\1\3 /
t again
这是使用经典的 sed
表示法.
That's using the classic sed
notation.
然后你可以写:
sed -f script.sed …data-files…
如果你有 GNU sed
或 BSD (Mac OS X) sed
,你可以使用扩展的正则表达式:
If you have GNU sed
or BSD (Mac OS X) sed
, you can use the extended regular expressions instead:
:again
/^(([^\t]{8})*)\t/s//\1 /
/^(([^\t]{8})*)([^\t]{1})\t/s//\1\3 /
/^(([^\t]{8})*)([^\t]{2})\t/s//\1\3 /
/^(([^\t]{8})*)([^\t]{3})\t/s//\1\3 /
/^(([^\t]{8})*)([^\t]{4})\t/s//\1\3 /
/^(([^\t]{8})*)([^\t]{5})\t/s//\1\3 /
/^(([^\t]{8})*)([^\t]{6})\t/s//\1\3 /
/^(([^\t]{8})*)([^\t]{7})\t/s//\1\3 /
t again
然后运行:
sed -r -f script.sed …data-files… # GNU sed
sed -E -f script.sed …data-files… # BSD sed
脚本有什么作用?
第一行设置一个标签;如果中间的任何 s///
操作进行了替换,则最后一行将跳转到该标签.因此,对于文件的每一行,脚本都会循环,直到没有匹配项,因此没有执行替换.
The first line sets a label; the last line jumps to that label if any of the s///
operations in between made a substitution. So, for each line of the file, the script loops until there are no matches made, and hence no substitutions performed.
8 个替换处理:
- 一个由零个或多个序列的 8 个非制表符组成的块,被捕获,然后是
- 另外 0-7 个非制表符的序列,也被捕获,然后是
- 一个标签.
- 它将匹配项替换为捕获的材料,后跟适当数量的空格.
在测试过程中发现的一个奇怪之处是,如果一行以空格结尾,pr
命令会删除尾随的空格.
One curiosity found during the testing is that if a line ends with white space, the pr
command removes that trailing white space.
在某些系统(至少是 BSD 或 Mac OS X)上还有 expand
命令,它保留尾随空白.使用它比 pr
或 sed
更简单.
There's also the expand
command on some systems (BSD or Mac OS X at least), which preserves the trailing white space. Using that is simpler than pr
or sed
.
使用这些 sed
脚本,并使用带有备份文件的 BSD 或 GNU sed
,您可以编写:
With these sed
scripts, and using the BSD or GNU sed
with backup files, you can write:
find . -type f -exec sed -i.bak -r -f script.sed {} +
(GNU sed
符号;用 -E
替换 -r
替换 BSD sed
.)
(GNU sed
notation; substitute -E
for -r
for BSD sed
.)
这篇关于如何用目录的每个文件中的空白替换选项卡的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!