根据其他文件中的列搜索文件中的替换字符串 [英] Search replace string in a file based on column in other file

查看:14
本文介绍了根据其他文件中的列搜索文件中的替换字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我们有第一个文件,如下所示:

If we have the first file like below:

(a.txt)
 1  asm
 2  assert
 3  bio
 4  Bootasm
 5  bootmain
 6  buf
 7  cat
 8  console
 9  defs
10  echo

第二个喜欢:

(b.txt)
 bio cat BIO bootasm
 bio defs cat
 Bio console 
 bio BiO
 bIo assert
 bootasm asm
 bootasm echo
 bootasm console
 bootmain buf
 bootmain bio
 bootmain bootmain
 bootmain defs
 cat cat
 cat assert
 cat assert

我们希望输出是这样的:

and we want the output will be like this:

 3 7 3 4
 3 9 7
 3 8
 3 3
 3 2
 4 1
 4 10
 4 8
 5 6
 5 3
 5 5
 5 9
 7 7
 7 2
 7 2

我们读取第一个文件中每个文件的第二列,我们搜索它是否存在于第二个文件的每一行的每一列中,如果是,则将其替换为第一个文件中第一列中的数字.我只在第一列做到了,其余的我都做不到.

we read each second column in each file in the first file, we search if it exist in each column in each line in the second file if yes we replace it with the the number in the first column in the first file. i did it in only the fist column, i couldn't do it for the rest.

这里是我使用的命令awk 'NR==FNR{a[$2]=$1;next}{$1=a[$1];}1' a.txt b.txt

Here the command i use awk 'NR==FNR{a[$2]=$1;next}{$1=a[$1];}1' a.txt b.txt

3 cat bio bootasm
3 defs cat
3 console
3 bio
3 assert
4 asm
4 echo
4 console
5 buf
5 bio
5 bootmain
5 defs
7 cat
7 assert
7 assert

我应该如何处理其他列?

how should i do to the other columns ?

谢谢

推荐答案

awk 'NR==FNR{h[$2]=$1;next} {for (i=1; i<=NF;i++) $i=h[$i];}1' a.txt b.txt

NR 是所有文件的全局记录号(默认行号).FNR 是当前文件的行号.NR==FNR 块指定当全局行号等于当前行号时要采取的操作,这仅适用于第一个文件,即 a.txt.此块中的 next 语句跳过其余代码,因此 for 循环仅适用于第二个文件,即 b.txt.

NR is the global record number (line number default) across all files. FNR is the line number for the current file. The NR==FNR block specifies what action to take when global line number is equal to the current number, which is only true for the first file, i.e., a.txt. The next statement in this block skips the rest of the code so the for loop is only available to the second file, e.i., b.txt.

首先,我们处理第一个文件,以便将单词 id 存储在关联数组中:NR==FNR{h[$2]=$1;next}.之后,我们可以使用这些 id 来映射第二个文件中的单词.for 循环 (for (i=1; i<=NF;i++) $i=h[$i];) 遍历所有列并将每一列设置为一个数字而不是字符串,所以 $i=h[$i] 实际上用它的 id 替换了第 i 列的单词.最后,脚本末尾的 1 会打印出所有行.

First, we process the first file in order to store the word ids in an associative array: NR==FNR{h[$2]=$1;next}. After which, we can use these ids to map the words in the second file. The for loop (for (i=1; i<=NF;i++) $i=h[$i];) iterates over all columns and sets each column to a number instead of the string, so $i=h[$i] actually replaces the word at the ith column with its id. Finally the 1 at the end of the scripts causes all lines to be printed out.

生产:

3 7 3 4
3 9 7
3 8
3 3
3 2
4 1
4 10
4 8
5 6
5 3
5 5
5 9
7 7
7 2
7 2

要使脚本不区分大小写,请将 tolower 调用添加到数组索引中:

To make the script case-insensitive, add tolower calls into the array indices:

awk 'NR==FNR{h[tolower($2)]=$1;next} {for (i=1; i<=NF;i++) $i=h[tolower($i)];}1' a.txt b.txt

这篇关于根据其他文件中的列搜索文件中的替换字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆