如何用目录的每个文件中的空白替换选项卡 [英] how to replace the tabs with empty space in each file of a directory

查看:51
本文介绍了如何用目录的每个文件中的空白替换选项卡的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用相应的空白区域替换目录中每个文件中的选项卡.我已经找到了一个解决方案 11094383,您可以在其中用给定空格数替换制表符:

I would like to replace the tabs in each file of a directory with the corresponding empty space. I found already a solution 11094383, where you can replace tabs with given number of empty spaces:

> find ./ -type f -exec sed -i 's/\t/     /g' {} \;

在上面的解决方案中,选项卡被替换为四个空格.但在我的情况下,选项卡可以占用更多空间 - 例如8.

In the solution above tabs are replaced with four spaces. But in my case tabs can occupy more spaces - e.g. 8.

带有制表符的文件示例,应替换为 8 个空格:

An example of file with tabs, which should be replaced with 8 spaces is:

NSMl1        100  PSHELL 0.00260  400000  400200  400300
          400400  400500  400600  400700  400800  400900
      401000  401100  400100  430000  430200  430300
      430400  430500  430600  430700  430800  430900
      431000  431100  430100  401200  431200

这里带有制表符的行是第 3 到第 5 行.

here the lines with tabs are the 3th to the 5th line.

带有标签的文件示例,应替换为 4 个标签:

An example of file with tabs, which should be replaced with 4 tabs is:

RBE2     1101001 5000511  123456    1100

有人可以帮忙吗?

推荐答案

经典的答案是使用带有选项的 pr 命令将制表符扩展到适当数量的空格,转换分页功能:

The classic answer is to use the pr command with options to expand tabs into an appropriate number of spaces, turning of the pagination features:

pr -e8 -l1 -t …files…

棘手的部分是覆盖文件,这似乎是问题的一部分.当然,GNU 和 BSD (Mac OS X) 版本中的 sed 支持使用 -i 选项覆盖 — 两者之间的行为变体为 BSD sed 需要备份文件的后缀,而 GNU sed 不需要.但是,sed 不(现成)支持将制表符转换为适当数量的空格,因此并不完全合适.

The tricky part is getting the file over-written that seems to be part of the question. Of course, sed in the GNU and BSD (Mac OS X) incarnations supports overwriting with the -i option — with variant behaviours between the two as BSD sed requires a suffix for the backup files and GNU sed does not. However, sed does not (readily) support converting tabs to an appropriate number of blanks, so it isn't wholly appropriate.

可以做到这一点的 UNIX 编程环境.我从 1987 年开始使用这个脚本(第一次签入 - 最后一次更新是在 2005 年).

There's a script overwrite (which I abbreviate to ow) in The UNIX Programming Environment that can do that. I've been using the script since 1987 (first checkin — last updated in 2005).

#!/bin/sh
#       Overwrite file
#       From: The UNIX Programming Environment by Kernighan and Pike
#       Amended: remove PATH setting; handle file names with blanks.

case $# in
0|1)    echo "Usage: $0 file command [arguments]" 1>&2
        exit 1;;
esac

file="$1"
shift
new=${TMPDIR:-/tmp}/ovrwr.$$.1
old=${TMPDIR:-/tmp}/ovrwr.$$.2

trap "rm -f '$new' '$old' ; exit 1" 0 1 2 15

if "$@" >"$new"
then
    cp "$file" "$old"
    trap "" 1 2 15
    cp "$new" "$file"
    rm -f "$new" "$old"
    trap 0
    exit 0
else
    echo "$0: $1 failed - $file unchanged" 1>&2
    rm -f "$new" "$old"
    trap 0
    exit 1
fi

现在在大多数系统上使用 mktemp 命令是可能的,而且可以说更好;那时还不存在.

It would be possible and arguably better to use the mktemp command on most systems these days; it didn't exist way back then.

在问题的上下文中,您可以使用:

In the context of the question, you could then use:

find . -type f -exec ow {} pr -e8 -t -l1 \;

您确实需要单独处理每个文件.

You do need to process each file separately.

如果您真的决心使用 sed 来完成这项工作,那么您的工作就被裁掉了.有一个可怕的方法来做到这一点.存在符号问题;如何表示文字制表符;我将使用 \t 来表示它.该脚本将存储在一个文件中,我假设该文件是 script.sed:

If you are truly determined to use sed for the job, then you have your work cut out. There's a gruesome way to do it. There is a notational problem; how to represent a literal tab; I will use \t to denote it. The script would be stored in a file, which I'll assume is script.sed:

:again
/^\(\([^\t]\{8\}\)*\)\t/s//\1        /
/^\(\([^\t]\{8\}\)*\)\([^\t]\{1\}\)\t/s//\1\3       /
/^\(\([^\t]\{8\}\)*\)\([^\t]\{2\}\)\t/s//\1\3      /
/^\(\([^\t]\{8\}\)*\)\([^\t]\{3\}\)\t/s//\1\3     /
/^\(\([^\t]\{8\}\)*\)\([^\t]\{4\}\)\t/s//\1\3    /
/^\(\([^\t]\{8\}\)*\)\([^\t]\{5\}\)\t/s//\1\3   /
/^\(\([^\t]\{8\}\)*\)\([^\t]\{6\}\)\t/s//\1\3  /
/^\(\([^\t]\{8\}\)*\)\([^\t]\{7\}\)\t/s//\1\3 /
t again

这是使用经典的 sed 表示法.

That's using the classic sed notation.

然后你可以写:

sed -f script.sed …data-files…

如果你有 GNU sed 或 BSD (Mac OS X) sed,你可以使用扩展的正则表达式:

If you have GNU sed or BSD (Mac OS X) sed, you can use the extended regular expressions instead:

:again
/^(([^\t]{8})*)\t/s//\1        /
/^(([^\t]{8})*)([^\t]{1})\t/s//\1\3       /
/^(([^\t]{8})*)([^\t]{2})\t/s//\1\3      /
/^(([^\t]{8})*)([^\t]{3})\t/s//\1\3     /
/^(([^\t]{8})*)([^\t]{4})\t/s//\1\3    /
/^(([^\t]{8})*)([^\t]{5})\t/s//\1\3   /
/^(([^\t]{8})*)([^\t]{6})\t/s//\1\3  /
/^(([^\t]{8})*)([^\t]{7})\t/s//\1\3 /
t again

然后运行:

sed -r -f script.sed …data-files…    # GNU sed
sed -E -f script.sed …data-files…    # BSD sed

脚本有什么作用?

第一行设置一个标签;如果中间的任何 s/// 操作进行了替换,则最后一行将跳转到该标签.因此,对于文件的每一行,脚本都会循环,直到没有匹配项,因此没有执行替换.

The first line sets a label; the last line jumps to that label if any of the s/// operations in between made a substitution. So, for each line of the file, the script loops until there are no matches made, and hence no substitutions performed.

8 个替换处理:

  • 一个由零个或多个序列的 8 个非制表符组成的块,被捕获,然后是
  • 另外 0-7 个非制表符的序列,也被捕获,然后是
  • 一个标签.
  • 它将匹配项替换为捕获的材料,后跟适当数量的空格.

在测试过程中发现的一个奇怪之处是,如果一行以空格结尾,pr 命令会删除尾随的空格.

One curiosity found during the testing is that if a line ends with white space, the pr command removes that trailing white space.

在某些系统(至少是 BSD 或 Mac OS X)上还有 expand 命令,它保留尾随空白.使用它比 prsed 更简单.

There's also the expand command on some systems (BSD or Mac OS X at least), which preserves the trailing white space. Using that is simpler than pr or sed.

使用这些 sed 脚本,并使用带有备份文件的 BSD 或 GNU sed,您可以编写:

With these sed scripts, and using the BSD or GNU sed with backup files, you can write:

find . -type f -exec sed -i.bak -r -f script.sed {} +

(GNU sed 符号;用 -E 替换 -r 替换 BSD sed.)

(GNU sed notation; substitute -E for -r for BSD sed.)

这篇关于如何用目录的每个文件中的空白替换选项卡的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆