搜索根据其他文件列在文件中替换字符串 [英] Search replace string in a file based on column in other file
问题描述
如果我们有了第一个文件象下面这样:
(A.TXT)
1 ASM
2断言
3生物
4 Bootasm
5 bootmain
6 BUF
7猫
8控制台
9 DEFS
10回声
和第二个这样的:
(b.txt)
生物猫BIO bootasm
生物猫DEFS
生物控制台
比奥比奥
生物断言
bootasm ASM
bootasm回声
bootasm控制台
bootmain BUF
bootmain生物
bootmain bootmain
bootmain DEFS
猫猫
猫断言
猫断言
和我们想要的输出将是这样的:
3 7 3 4
3 9 7
3 8
3 3
3 2
4 1
4 10
4 8
5 6
5 3
5 5
5 9
7 7
7 2
7 2
我们阅读在第一文件中的每个文件中的每个第二列,我们搜索,如果它在每列中存在于在第二文件中的每一行,如果是的,我们用在第一个文件的第一列的数量代替它。我做到了只在第一列,我不能为剩下的做到这一点。
在这里,我用命令
awk的'NR == FNR {a [$ 2] = $ 1;接下来} {$ 1 = [$ 1];} 1'A.TXT b.txt
3的猫生物bootasm
3 DEFS猫
3控制台
3生物
3断言
4 ASM
4回声
4控制台
5 BUF
5生物
5 bootmain
5 DEFS
7猫
7断言
7断言
我应该怎么办的其他列?
三江源
的awk'NR == FNR {H [$ 2] = $ 1;接下来} {为(i = 1; I&LT = NF;我++)$ I = H [$ i];} 1'A.TXT b.txt
NR
是在所有文件中的全局记录号(行号默认)。 FNR
是当前文件的行号。在 NR == FNR
块指定当全球行号等于当前的数量,这仅仅是第一个真正的文件,即A.TXT如何采取行动。在此块中的接下来
语句将跳过code的其余部分,以便for循环只提供给第二个文件,E.I.,b.txt。
首先,我们处理的第一个文件,以存储词ID关联数组: NR == FNR {H [$ 2] = $ 1;接下来}
。在这之后,我们可以用这些ID的话在第二个文件映射。 for循环(为(i = 1; I< = NF;我++)$ I = H [$ i];
)对所有列迭代并设置每个列到数而不是字符串,那么 $ I = H [$ i]
实际上是在其ID的第i列替换词。最后, 1
在脚本的结束导致所有行被打印出来。
的产地:的
3 7 3 4
3 9 7
3 8
3 3
3 2
4 1
4 10
4 8
5 6
5 3
5 5
5 9
7 7
7 2
7 2
为使脚本不区分大小写,添加 tolower的
调用到数组索引:
的awk'NR == FNR {H [tolower的($ 2)] = $ 1;接下来} {为(i = 1; I< = NF;我++)$ I = H [ tolower的(I $)]; A.TXT b.txt} 1'
If we have the first file like below:
(a.txt)
1 asm
2 assert
3 bio
4 Bootasm
5 bootmain
6 buf
7 cat
8 console
9 defs
10 echo
and the second like:
(b.txt)
bio cat BIO bootasm
bio defs cat
Bio console
bio BiO
bIo assert
bootasm asm
bootasm echo
bootasm console
bootmain buf
bootmain bio
bootmain bootmain
bootmain defs
cat cat
cat assert
cat assert
and we want the output will be like this:
3 7 3 4
3 9 7
3 8
3 3
3 2
4 1
4 10
4 8
5 6
5 3
5 5
5 9
7 7
7 2
7 2
we read each second column in each file in the first file, we search if it exist in each column in each line in the second file if yes we replace it with the the number in the first column in the first file. i did it in only the fist column, i couldn't do it for the rest.
Here the command i use awk 'NR==FNR{a[$2]=$1;next}{$1=a[$1];}1' a.txt b.txt
3 cat bio bootasm
3 defs cat
3 console
3 bio
3 assert
4 asm
4 echo
4 console
5 buf
5 bio
5 bootmain
5 defs
7 cat
7 assert
7 assert
how should i do to the other columns ?
Thankyou
awk 'NR==FNR{h[$2]=$1;next} {for (i=1; i<=NF;i++) $i=h[$i];}1' a.txt b.txt
NR
is the global record number (line number default) across all files. FNR
is the line number for the current file. The NR==FNR
block specifies what action to take when global line number is equal to the current number, which is only true for the first file, i.e., a.txt. The next
statement in this block skips the rest of the code so the for loop is only available to the second file, e.i., b.txt.
First, we process the first file in order to store the word ids in an associative array: NR==FNR{h[$2]=$1;next}
. After which, we can use these ids to map the words in the second file. The for loop (for (i=1; i<=NF;i++) $i=h[$i];
) iterates over all columns and sets each column to a number instead of the string, so $i=h[$i]
actually replaces the word at the ith column with its id. Finally the 1
at the end of the scripts causes all lines to be printed out.
Produces:
3 7 3 4
3 9 7
3 8
3 3
3 2
4 1
4 10
4 8
5 6
5 3
5 5
5 9
7 7
7 2
7 2
To make the script case-insensitive, add tolower
calls into the array indices:
awk 'NR==FNR{h[tolower($2)]=$1;next} {for (i=1; i<=NF;i++) $i=h[tolower($i)];}1' a.txt b.txt
这篇关于搜索根据其他文件列在文件中替换字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!