如何针对另一个文件在一个文件中搜索单词并在一行中显示第一个匹配的单词 [英] how to perform a search for words in one file against another file and display the first matching word in a line

查看:53
本文介绍了如何针对另一个文件在一个文件中搜索单词并在一行中显示第一个匹配的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个烦人的问题.我有两个文件.

I have an annoying problem. I have two files.

$ cat file1
Sam
Tom

$ cat file2
I am Sam. Sam I am.
Tom
I am Tom. Tom I am.

文件1是单词列表文件,而文件2是包含不同列数的文件.我想对文件2使用文件1进行搜索,显示所有可能出现在文件2每一行的第一个匹配词.因此结果需要为以下内容:

File 1 is a word list file whereas file2 is a file containing varying number of columns. I want to perform a search using file 1 against file2, display all possible the first matching word that appear in each line of file2. Thus the result needs to be the following:

Sam (line 1 match)
Tom (line 2 match)
Tom (line 3 match)

如果f2以下,

I am Sam. Sam I am.
Tom
I am Tom. Tom I am.
I am Tom. Sam I am.
I am Sam. Tom I am.
I am Sammy.

它需要显示以下内容:

Sam (1st line match)
Tom (2nd line match)
Tom (3rd line match)
Tom (4th line match)
Sam (4th line match)
Sam (5th line match)
Tom (5th line match)
Sam (6th line match)

我认为我需要awk解决方案,因为命令"grep -f file1 file2"将不起作用.

I think I need an awk solution since the command "grep -f file1 file2" won't work.

推荐答案

使用GNU awk进行 sorted_in :

With GNU awk for sorted_in:

$ cat tst.awk
BEGIN { PROCINFO["sorted_in"] = "@val_num_asc" }
NR==FNR { res[$0]; next }
{
    delete found
    for ( re in res ) {
        if ( !(re in found) ) {
            if ( match($0,re) ) {
                found[re] = RSTART
            }
        }
    }
    for ( re in found ) {
        printf "%s (line #%d match)\n", re, FNR
    }
}

$ awk -f tst.awk file1 file2
Sam (line #1 match)
Tom (line #2 match)
Tom (line #3 match)
Tom (line #4 match)
Sam (line #4 match)
Sam (line #5 match)
Tom (line #5 match)
Sam (line #6 match)

这篇关于如何针对另一个文件在一个文件中搜索单词并在一行中显示第一个匹配的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆