使用 SED 使用带有要删除的行号的索引删除某些行 [英] Use SED to delete certain lines using an index with the line numbers to delete

查看：32 发布时间：2022/1/6 14:04:47 linux bash awk sed grep

本文介绍了使用 SED 使用带有要删除的行号的索引删除某些行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我得到了一个大文件，称为 file.txt，它可能有 20000 行或更多行.其中一些行必须从原始文件中删除，并且必须创建一个包含剩余行的新文件，例如 newfile.txt.要删除的行在另一个文件中，例如 index.txt.所以我是这样的:

I get a big file, call it file.txt, which may have 20000 lines or more. Some of those lines have to be removed from the original file, and a new file containing the remaining lines has to be created, like newfile.txt. The lines to be deleted are in another file, like index.txt. So what I is something like:

文件.txt:

line1
line2
...
line19999
line20000

索引.txt

我一直在尝试使用 sed，试图让它使用索引中的数字来删除这些行，例如:

I've been trying to use sed, trying to get it to use the numbers in the index to delete those lines, with something like:

for i in ${index.txt[@]}
do
    sed -i.back '${i}d' file.txt>newfile.txt
done

但是，我收到一条错误消息，说 ${index.txt[@]}: bad replacement ，我不知道如何解决这个问题.

However, I get an error saying ${index.txt[@]}: bad substitution , and I have no idea how to fix this.

我也尝试过使用 gawk，但是代码有问题，我认为这与文件缩进有制表符有关.如果有人可以提供帮助，我将不胜感激.

I've also tried to use gawk, but there was something wrong with the code, I think it had to do with the fact that the file is indented with tabs. If anyone could help I'd greatly appreciate it.

推荐答案

不要不要在循环中调用sed，那样会很慢.

Do not call sed in a loop, that will be very slow.

您可以将索引文件转换为 sed 脚本，然后在数据文件上调用 sed 一次:

You could transform the index file into a sed script, then call sed once on the data file:

sed -i.bak "$(sed 's/$/d/' index.txt)" file.txt

或者，正如@Hazzard17 指出的那样，忽略不只包含数字的行:

Or, as @Hazzard17 points out, ignore lines that don't contain just digits:

script=$(sed -n '/^[[:blank:]]*[[:digit:]]+[[:blank:]]*$/ s/$/d/p' index.txt)
sed -i.bak "$script" file.txt

演示:

$ seq 20000 | sed 's/^/line/' > file.txt
$ wc file.txt
 20000  20000 188894 file.txt
$ seq 20000 | while read n; do [[ $RANDOM -le 5000 ]] && echo $n; done > index.txt
$ wc index.txt
 3083  3083 16789 index.txt
$ sed -i.bak "$(sed 's/$/d/' index.txt)" file.txt
$ wc -l file.txt{,.bak}
 16917 file.txt
 20000 file.txt.bak
 36917 total

<小时>

要将文件读入数组，您可以:

To read a file into an array, you can do:

mapfile -t indices < index.txt
for i in "${indices[@]}"; do ...; done

或者只是遍历文件

while IFS= read -r i; do ...; done < index.txt

这篇关于使用 SED 使用带有要删除的行号的索引删除某些行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用 SED 使用带有要删除的行号的索引删除某些行 [英] Use SED to delete certain lines using an index with the line numbers to delete

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

使用 SED 使用带有要删除的行号的索引删除某些行 [英] Use SED to delete certain lines using an index with the line numbers to delete

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭