外壳:查找跨越多个文件的匹配行 [英] Shell: Find Matching Lines Across Many Files
问题描述
我正在尝试使用shell脚本(以及单行)来查找大约50个文件之间的任何常见行。
编辑:注意:我正在寻找出现在所有文件中的一行(行)
到目前为止,我尝试过grep grep -v -x -f file1.sp *
它与所有其他文件中的文件内容匹配。
我也试过 grep -v -x -f file1.sp file2.sp | grep -v -x -f - file3.sp | grep -v -x -f - file4.sp | grep -v -x -f -file5.sp
等等......但是我相信使用这些文件进行的搜索将作为STD进行搜索,而不是匹配模式。
有人知道如何使用grep或其他工具来做到这一点吗?
我不介意是否需要一段时间才能运行,I我们必须在约500个文件中添加几行代码,并且希望在它们的每一个中找到一条通用行,以便在之后插入(它们最初只是来自一个文件的c& p,所以希望有一些共同的行!)
感谢您的宝贵时间, 从@ mjgpy3回答,你只需要make a for循环并使用 保存在 假设您的所有文件名都以 查看其他答案,我想给出一个每个文件打开一次而不使用任何临时文件并支持重复行的文件。另外,让我们并行处理这些文件。 在这里你可以看到(在python3中): 将它保存到 更多使用信息与 I am trying to use a shell script (well a "one liner") to find any common lines between around 50 files.
Edit: Note I am looking for a line (lines) that appears in all the files So far i've tried grep I've also tried Does anyone know how to do this with grep or another tool? I don't mind if it takes a while to run, I've got to add a few lines of code to around 500 files and wanted to find a common line in each of them for it to insert 'after' (they were originally just c&p from one file so hopefully there are some common lines!) Thanks for your time, From @mjgpy3 answer, you just have to make a for loop and use Save in a assuming all your filenames end with Looking at the other answers, I wanted to give one that opens once each file without using any temporary file, and supports duplicated lines. Additionally, let's process the files in parallel. Here you go (in python3): Save it into a More usage info with the 这篇关于外壳:查找跨越多个文件的匹配行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋! 2 * n
文件)
comm
,如下所示: code>#!/ bin / bash
tmp1 =/ tmp / tmp1 $ RANDOM
tmp2 =/ tmp / tmp2 $ RANDOM
cp$ 1$ tmp1
转移
用于$ @中的文件
do
comm -1 -2$ tmp1$文件> $ tmp2
mv$ tmp2$ tmp1
完成
猫$ tmp1
rm$ tmp1
comm.sh
中,使其可执行并调用
./ comm.sh * .sp
.sp
结尾。
更新后的答案python仅打开每个文件一次
#!/ bin / env python
import argparse
import sys
import multiprocessing
import os
EOLS = {'native':os.linesep.encode('ascii'),'unix':b'\\\
','windows':b'\r\\\
'}
def extract_set(filename):
with open(filename,'rb')as f:
return set(line.rstrip(b'\r\\\
')for line in f )
def find_common_lines(文件名):
pool = multiprocessing.Pool()
line_sets = pool.map(extract_set,filenames)
return set.intersection(* line_sets)
if __name__ =='__main__':
#使用信息和参数解析
parser = argparse.ArgumentParser()
parser.add_argument(in_files ,nargs ='+',
help =在这些文件中查找常用行)
parser.add_argumen t''out',type = argparse.FileType('wb'),
help =输出文件(默认stdout))
parser.add_argument(' - eol-style' ,options = EOLS.keys(),default ='native',
help =(default:native))
args = parser.parse_args()
#actual stuff
common_lines = find_common_lines(args.in_files)
#将结果写入输出
to_print = EOLS [args.eol_style] .join(common_lines)
如果参数。 out是None:
#找出stdout的编码,utf-8如果缺失
encoding = sys.stdout.encoding或'utf-8'
sys.stdout.write(to_print.decode(编码))
else:
args.out.write(to_print)
find_common_lines.py
中,然后调用
python ./ find_common_lines.py * .sp
- help
选项。grep -v -x -f file1.sp *
which just matches that files contents across ALL the other files.grep -v -x -f file1.sp file2.sp | grep -v -x -f - file3.sp | grep -v -x -f - file4.sp | grep -v -x -f - file5.sp
etc... but I believe that searches using the files to be searched as STD in not the pattern to match on.old, bash answer (O(n); opens
2 * n
files)comm
, like this:#!/bin/bash
tmp1="/tmp/tmp1$RANDOM"
tmp2="/tmp/tmp2$RANDOM"
cp "$1" "$tmp1"
shift
for file in "$@"
do
comm -1 -2 "$tmp1" "$file" > "$tmp2"
mv "$tmp2" "$tmp1"
done
cat "$tmp1"
rm "$tmp1"
comm.sh
, make it executable, and call./comm.sh *.sp
.sp
.Updated answer, python, opens only each file once
#!/bin/env python
import argparse
import sys
import multiprocessing
import os
EOLS = {'native': os.linesep.encode('ascii'), 'unix': b'\n', 'windows': b'\r\n'}
def extract_set(filename):
with open(filename, 'rb') as f:
return set(line.rstrip(b'\r\n') for line in f)
def find_common_lines(filenames):
pool = multiprocessing.Pool()
line_sets = pool.map(extract_set, filenames)
return set.intersection(*line_sets)
if __name__ == '__main__':
# usage info and argument parsing
parser = argparse.ArgumentParser()
parser.add_argument("in_files", nargs='+',
help="find common lines in these files")
parser.add_argument('--out', type=argparse.FileType('wb'),
help="the output file (default stdout)")
parser.add_argument('--eol-style', choices=EOLS.keys(), default='native',
help="(default: native)")
args = parser.parse_args()
# actual stuff
common_lines = find_common_lines(args.in_files)
# write results to output
to_print = EOLS[args.eol_style].join(common_lines)
if args.out is None:
# find out stdout's encoding, utf-8 if absent
encoding = sys.stdout.encoding or 'utf-8'
sys.stdout.write(to_print.decode(encoding))
else:
args.out.write(to_print)
find_common_lines.py
, and callpython ./find_common_lines.py *.sp
--help
option.