根据其他文件中的列搜索文件中的替换字符串 [英] Search replace string in a file based on column in other file
问题描述
如果我们有第一个文件,如下所示:
If we have the first file like below:
(a.txt)
1 asm
2 assert
3 bio
4 Bootasm
5 bootmain
6 buf
7 cat
8 console
9 defs
10 echo
第二个喜欢:
(b.txt)
bio cat BIO bootasm
bio defs cat
Bio console
bio BiO
bIo assert
bootasm asm
bootasm echo
bootasm console
bootmain buf
bootmain bio
bootmain bootmain
bootmain defs
cat cat
cat assert
cat assert
我们希望输出是这样的:
and we want the output will be like this:
3 7 3 4
3 9 7
3 8
3 3
3 2
4 1
4 10
4 8
5 6
5 3
5 5
5 9
7 7
7 2
7 2
我们读取第一个文件中每个文件的第二列,我们搜索它是否存在于第二个文件的每一行的每一列中,如果是,则将其替换为第一个文件中第一列中的数字.我只在第一列做到了,其余的我都做不到.
we read each second column in each file in the first file, we search if it exist in each column in each line in the second file if yes we replace it with the the number in the first column in the first file. i did it in only the fist column, i couldn't do it for the rest.
这里是我使用的命令awk 'NR==FNR{a[$2]=$1;next}{$1=a[$1];}1' a.txt b.txt
Here the command i use awk 'NR==FNR{a[$2]=$1;next}{$1=a[$1];}1' a.txt b.txt
3 cat bio bootasm
3 defs cat
3 console
3 bio
3 assert
4 asm
4 echo
4 console
5 buf
5 bio
5 bootmain
5 defs
7 cat
7 assert
7 assert
我应该如何处理其他列?
how should i do to the other columns ?
谢谢
推荐答案
awk 'NR==FNR{h[$2]=$1;next} {for (i=1; i<=NF;i++) $i=h[$i];}1' a.txt b.txt
NR
是所有文件的全局记录号(默认行号).FNR
是当前文件的行号.NR==FNR
块指定当全局行号等于当前行号时要采取的操作,这仅适用于第一个文件,即 a.txt.此块中的 next
语句跳过其余代码,因此 for 循环仅适用于第二个文件,即 b.txt.
NR
is the global record number (line number default) across all files. FNR
is the line number for the current file. The NR==FNR
block specifies what action to take when global line number is equal to the current number, which is only true for the first file, i.e., a.txt. The next
statement in this block skips the rest of the code so the for loop is only available to the second file, e.i., b.txt.
首先,我们处理第一个文件,以便将单词 id 存储在关联数组中:NR==FNR{h[$2]=$1;next}
.之后,我们可以使用这些 id 来映射第二个文件中的单词.for 循环 (for (i=1; i<=NF;i++) $i=h[$i];
) 遍历所有列并将每一列设置为一个数字而不是字符串,所以 $i=h[$i]
实际上用它的 id 替换了第 i 列的单词.最后,脚本末尾的 1
会打印出所有行.
First, we process the first file in order to store the word ids in an associative array: NR==FNR{h[$2]=$1;next}
. After which, we can use these ids to map the words in the second file. The for loop (for (i=1; i<=NF;i++) $i=h[$i];
) iterates over all columns and sets each column to a number instead of the string, so $i=h[$i]
actually replaces the word at the ith column with its id. Finally the 1
at the end of the scripts causes all lines to be printed out.
生产:
3 7 3 4
3 9 7
3 8
3 3
3 2
4 1
4 10
4 8
5 6
5 3
5 5
5 9
7 7
7 2
7 2
要使脚本不区分大小写,请将 tolower
调用添加到数组索引中:
To make the script case-insensitive, add tolower
calls into the array indices:
awk 'NR==FNR{h[tolower($2)]=$1;next} {for (i=1; i<=NF;i++) $i=h[tolower($i)];}1' a.txt b.txt
这篇关于根据其他文件中的列搜索文件中的替换字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!