Shell:在多个文件中查找匹配行 [英] Shell: Find Matching Lines Across Many Files

查看:17
本文介绍了Shell:在多个文件中查找匹配行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 shell 脚本(以及单行")来查找大约 50 个文件之间的任何公共行.注意我正在寻找出现在所有文件中的一行(行)

I am trying to use a shell script (well a "one liner") to find any common lines between around 50 files. Note I am looking for a line (lines) that appears in all the files

到目前为止,我已经尝试过 grep grep -v -x -f file1.sp * 它只是匹配所有其他文件中的文件内容.

So far i've tried grep grep -v -x -f file1.sp * which just matches that files contents across ALL the other files.

我也试过 grep -v -x -f file1.sp file2.sp |grep -v -x -f - file3.sp |grep -v -x -f - file4.sp |grep -v -x -f - file5.sp 等...但我相信使用要作为 STD 搜索的文件而不是匹配的模式进行搜索.

I've also tried grep -v -x -f file1.sp file2.sp | grep -v -x -f - file3.sp | grep -v -x -f - file4.sp | grep -v -x -f - file5.sp etc... but I believe that searches using the files to be searched as STD in not the pattern to match on.

有谁知道如何使用 grep 或其他工具来做到这一点?

Does anyone know how to do this with grep or another tool?

我不介意运行是否需要一段时间,我必须向大约 500 个文件添加几行代码,并希望在每个文件中找到一个公共行,以便插入之后"(它们最初只是一个文件中的 c&p,所以希望有一些共同的行!)

I don't mind if it takes a while to run, I've got to add a few lines of code to around 500 files and wanted to find a common line in each of them for it to insert 'after' (they were originally just c&p from one file so hopefully there are some common lines!)

感谢您的时间,

推荐答案

old, bash answer (O(n); opens 2 * n files)

来自@mjgpy3 的回答,你只需要创建一个 for 循环并使用 comm,就像这样:

old, bash answer (O(n); opens 2 * n files)

From @mjgpy3 answer, you just have to make a for loop and use comm, like this:

#!/bin/bash

tmp1="/tmp/tmp1$RANDOM"
tmp2="/tmp/tmp2$RANDOM"

cp "$1" "$tmp1"
shift
for file in "$@"
do
    comm -1 -2 "$tmp1" "$file" > "$tmp2"
    mv "$tmp2" "$tmp1"
done
cat "$tmp1"
rm "$tmp1"

保存在 comm.sh 中,使其可执行,然后调用

Save in a comm.sh, make it executable, and call

./comm.sh *.sp 

假设所有文件名都以 .sp 结尾.

assuming all your filenames end with .sp.

查看其他答案,我想给出一个在不使用任何临时文件的情况下打开每个文件一次并支持重复行的答案.此外,让我们并行处理这些文件.

Looking at the other answers, I wanted to give one that opens once each file without using any temporary file, and supports duplicated lines. Additionally, let's process the files in parallel.

给你(在python3中):

Here you go (in python3):

#!/bin/env python
import argparse
import sys
import multiprocessing
import os

EOLS = {'native': os.linesep.encode('ascii'), 'unix': b'
', 'windows': b'
'}

def extract_set(filename):
    with open(filename, 'rb') as f:
        return set(line.rstrip(b'
') for line in f)

def find_common_lines(filenames):
    pool = multiprocessing.Pool()
    line_sets = pool.map(extract_set, filenames)
    return set.intersection(*line_sets)

if __name__ == '__main__':
    # usage info and argument parsing
    parser = argparse.ArgumentParser()
    parser.add_argument("in_files", nargs='+', 
            help="find common lines in these files")
    parser.add_argument('--out', type=argparse.FileType('wb'),
            help="the output file (default stdout)")
    parser.add_argument('--eol-style', choices=EOLS.keys(), default='native',
            help="(default: native)")
    args = parser.parse_args()

    # actual stuff
    common_lines = find_common_lines(args.in_files)

    # write results to output
    to_print = EOLS[args.eol_style].join(common_lines)
    if args.out is None:
        # find out stdout's encoding, utf-8 if absent
        encoding = sys.stdout.encoding or 'utf-8'
        sys.stdout.write(to_print.decode(encoding))
    else:
        args.out.write(to_print)

将其保存到find_common_lines.py,并调用

python ./find_common_lines.py *.sp

通过 --help 选项获得更多使用信息.

More usage info with the --help option.

这篇关于Shell:在多个文件中查找匹配行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆