如何检查,如果一个文件的另一部分? [英] How to check if one file is part of other?
问题描述
我需要检查,如果一个文件是bash脚本另一个文件中。对于给定的多线图案和输入文件。
I need to check if one file is inside another file by bash script. For a given multiline pattern and input file.
返回值:
我想接收状态(如何grep命令)0,如果找到任何匹配,1,如果没有发现匹配。
I want to receive status (how in grep command) 0 if any matches were found, 1 if no matches were found.
模式:
- 多行,
- 行顺序很重要(如线单块处理),
- 包括字符,数字,字母,?,&安培;,*,#等,
说明
只有以下例子应该找到的匹配:
Only the following examples should found matches:
pattern file1 file2 file3 file4
222 111 111 222 222
333 222 222 333 333
333 333 444
444
以下should't:
the following should't:
pattern file1 file2 file3 file4 file5 file6 file7
222 111 111 333 *222 111 111 222
333 *222 222 222 *333 222 222
333 333* 444 111 333
444 333 333
下面是我的脚本:
Here's my script:
#!/bin/bash
function writeToFile {
if [ -w "$1" ] ; then
echo "$2" >> "$1"
else
echo -e "$2" | sudo tee -a "$1" > /dev/null
fi
}
function writeOnceToFile {
pcregrep --color -M "$2" "$1"
#echo $?
if [ $? -eq 0 ]; then
echo This file contains text that was added previously
else
writeToFile "$1" "$2"
fi
}
file=file.txt
#1?1
#2?2
#3?3
#4?4
pattern=`cat pattern.txt`
#2?2
#3?3
writeOnceToFile "$file" "$pattern"
我可以用grep命令模式的所有行,但失败的例子:
I can use grep command for all lines of pattern, but it fails with this example:
file.txt
#1?1
#2?2
#=== added line
#3?3
#4?4
pattern.txt
#2?2
#3?3
或者即使你换行:2 3
or even if you change lines: 2 with 3
file=file.txt
#1?1
#3?3
#2?2
#4?4
返回0时should't。
returning 0 when it should't.
我该如何解决?请注意,我preFER使用本机安装的程序(如果可以不pcregrep)。也许awk或者sed能解决这个问题呢?
How do I can fix it? Note that I prefer to use native installed programs (if this can be without pcregrep). Maybe sed or awk can resolve this problem?
推荐答案
我有一个工作版本用perl。
I have a working version using perl.
我原本以为它GNU AWK
的工作,但我没有。 RS =对空行空的字符串分割。见编辑历史破碎的awk版本。
I thought I had it working with GNU awk
, but I didn't. RS=empty string splits on blank lines. See the edit history for the broken awk version.
<一个href=\"http://stackoverflow.com/questions/152708/how-can-i-search-for-a-multiline-pattern-in-a-file\">How我可以搜索文件中的多行模式?展示了如何使用pcregrep,但我看不到的方式来得到它的工作时要搜索的模式可能包含正则表达式特殊字符。 -F
固定字符串模式不能有效地与多行模式:它仍然把图案为一组线的单独匹配。 (还不如多行固定字符串匹配。)我,你已经在你尝试使用pcregrep看到的。
How can I search for a multiline pattern in a file? shows how to use pcregrep, but I can't see a way to get it to work when the pattern to search may contain regex special characters. -F
fixed-string mode doesn't usefully work with multi-line mode: it still treats the pattern as a set of lines to be matched separately. (Not as a multi-line fixed-string to be matched.) I see you were already using pcregrep in your attempt.
顺便说一句,我认为你必须在你的code在非sudo的情况下的一个错误:
BTW, I think you have a bug in your code in the non-sudo case:
function writeToFile {
if [ -w "$1" ] ; then
"$2" >> "$1" # probably you mean echo "$2" >> "$1"
else
echo -e "$2" | sudo tee -a "$1" > /dev/null
fi
}
总之,在使用基于行的工具尝试遇到了失败,所以它的时间拉出一个更严重的编程语言,不会强制对我们的换行符约定。刚才看了这两个文件到变量,并使用非正则表达式搜索:
Anyway, attempts at using line-based tools have met with failure, so it's time to pull out a more serious programming language that doesn't force the newline convention on us. Just read both files into variables, and use a non-regex search:
#!/usr/bin/perl -w
# multi_line_match.pl pattern_file target_file
# exit(0) if a match is found, else exit(1)
#use IO::File;
use File::Slurp;
my $pat = read_file($ARGV[0]);
my $target = read_file($ARGV[1]);
if ((substr($target, 0, length($pat)) eq $pat) or index($target, "\n".$pat) >= 0) {
exit(0);
}
exit(1);
请参阅What是啜文件到Perl中的字符串的最佳方式?,以避免对依赖文件::啜食
(这不是标准的Perl的一部分发行版或Ubuntu默认15.04系统)。我去的File :: Slurp的部分原因是什么程序做的可读性,对于非Perl的爱好者,比起:
See What is the best way to slurp a file into a string in Perl? to avoid the dependency on File::Slurp
(which isn't part of the standard perl distro, or a default Ubuntu 15.04 system). I went for File::Slurp partly for readability of what the program is doing, for non-perl-geeks, compared to:
my $contents = do { local(@ARGV, $/) = $file; <> };
我正在读避免完整的文件到内存中,与来自 HTTP一个想法:// WWW .perlmonks.org /?NODE_ID = 98208 。我认为不匹配的情况下,通常会阅读还是整个文件一次。此外,逻辑是在文件的前处理匹配pretty复杂,我不想花很长一段时间的测试,以确保它是正确的所有情况。下面是我有什么才放弃:
I was working on avoiding reading the full file into memory, with an idea from http://www.perlmonks.org/?node_id=98208. I think non-matching cases would usually still read the whole file at once. Also, the logic was pretty complex for handling a match at the front of the file, and I didn't want to spend a long time testing to make sure it was correct for all cases. Here's what I had before giving up:
#IO::File->input_record_separator($pat);
$/ = $pat; # pat must include a trailing newline if you want it to match one
my $fh = IO::File->new($ARGV[2], O_RDONLY)
or die 'Could not open file ', $ARGV[2], ": $!";
$tail = substr($fh->getline, -1); #fast forward to the first match
#print each occurence in the file
#print IO::File->input_record_separator while $fh->getline;
#FIXME: something clever here to handle the case where $pat matches at the beginning of the file.
do {
# fixme: need to check defined($fh->getline)
if (($tail eq '\n') or ($tail = substr($fh->getline, -1))) {
exit(0); # if there's a 2nd line
}
} while($tail);
exit(1);
$fh->close;
另一个想法是筛选模式和文件通过搜索TR的'\\ n''\\ r'
什么的,所以他们都将是单线条。 ( \\ r
是,将尚未在一个文件或图案与任何一个碰撞安全的可能选择。)
Another idea was to filter patterns and files to be searched through tr '\n' '\r'
or something, so they would all be single-lines. (\r
being a likely safe choice that wouldn't collide with anything already in a file or a pattern.)
这篇关于如何检查,如果一个文件的另一部分?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!