用正则表达式过滤差异 [英] Filtering a diff with a regular expression

查看:190
本文介绍了用正则表达式过滤差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

看起来,能够过滤差异以便不显示微不足道的变化是非常方便的。我想编写一个正则表达式,它将在该行上运行,然后将另一个字符串传递给另一个字符串,该字符串使用捕获的参数生成一个规范形式。如果之前和之后的行生成相同的输出,那么它们将从diff中移除。



例如,我正在开发一个PHP代码库,数组访问被写为 my_array [my_key] ,当它们应该是 my_array [my_key] 以防止问题if定义了一个 my_key 常量。生成一个差异将是有用的,只有线上的唯一更改不会添加一些引号。



我不能一次全部更改它们,因为我们不会没有资源来测试整个代码库,因此每当我对函数进行更改时都要修复这个问题。我怎样才能做到这一点?还有其他与此类似的东西,我可以用来获得类似的结果。例如,更简单的方法可能是跳过规范形式,只看输入是否转换为输出。顺便说一句,我正在使用Git

解决方案

似乎没有任何Git的选项 diff 命令来支持你想要做的事情。不过,您可以使用 GIT_EXTERNAL_DIFF 环境变量和一个自定义脚本(或者使用您喜欢的脚本或编程语言创建的任何可执行文件)来操纵一个补丁。



我假设你在Linux上;如果没有,你可以调整这个概念来适应你的环境。假设您有一个Git仓库,其中 HEAD 有一个文件 file05 ,其中包含:

 第26662行:$ my_array [my_key] 

以及包含以下内容的文件 file06

  line 19768: $ my_array [my_key] 
行19769:$ my_array [my_key]
行19770:$ my_array [my_key]
行19771:$ my_array [my_key]
行19772:$ my_array [my_key]
line 19773:$ my_array [my_key]
line 19775:$ my_array [my_key]
line 19776:$ my_array [my_key]

您将 file05 更改为:

 第26662行:$ my_array [my_key] 

您将 file06 更改为:

  line 19768:$ my_array [my_key] 
行19769:$ my_array [my_key]
行19770:$ my_array [my_key]
行19771:$ my_array [my_key]
行19772:$ my_array [my_key]
line 19773:$ my_array [my_key]
line 19775:$ my_arr ay [my_key2]
line 19776:$ my_array [my_key]

使用以下shell脚本,我们将它称为 mydiff.sh ,并将它放置在我们的 PATH 中:

 #!/ bin / bash 
echo$ @
git diff-files --patch --word-diff = porcelain $ {5}| awk'
/^-./ {rec = FNR; prev = substr($ 0,2);}
FNR == rec + 1&& /^+./ {
ln = substr($ 0,2);
gsub(\\ [\,[,ln);
gsub(\\\,],ln);
if(prev == ln){
println;
} else {
print - prev;
print+ln;
}
}
FNR!= rec&& FNR!= rec + 1 {print;}
'

执行命令: p>

  GIT_EXTERNAL_DIFF = mydiff.sh git  - 无页面差异

将会输出:

$ file $ 05 $ t $ t $
index d86525e..c2180dc 100644
--- a / file05
+++ b / file05
@@ -1 +1 @@
line 26662:
$ my_array [my_key]

file06 / tmp / 2lgz7J_file06 d84a44f9a9aac6fb82e6ffb94db0eec5c575787d 100644 file06 0000000000000000000000000000000000000000 100644
index d84a44f..bc27446 100644
--- a / file06
+++ b / file06
@@ -1,8 +1,8 @@ b $ b行19768:$ my_array [my_key]

行19769:
$ my_array [my_key]

line 19770:$ my_array [my_key]

line 19771:$ my_array [my_key]

line 19772:$ my_array [my_key]

line 19773:$ my_array [my_key]

行19775:
- $ my_array [my_key]
+ $ my_array [my_key2]

行19776:$ my_array [my_key]

此输出不会显示 file05 file06 。外部diff文件基本上使用Git diff-files 命令创建补丁并通过 GNU awk 脚本来操纵它。此示例脚本不处理为 <$ c $提到的新旧文件的所有不同组合c> GIT_EXTERNAL_DIFF 也不会输出有效的补丁,但它应该足以让你开始。



您可以使用 Perl正则表达式 Python difflib 或任何您感兴趣的实现适合您需求的外部差异工具的内容。


It seems that it would be extremely handy to be able to filter a diff so that trivial changes are not displayed. I would like to write a regular expression which would be run on the line and then pass it another string that uses the captured arguments to generate a canonical form. If the lines before and after produce the same output, then they would be removed from the diff.

For example, I am working on a PHP code base where a significant number of array accesses are written as my_array[my_key] when they should be my_array["my_key"] to prevent issues if a my_key constant is defined. It would be useful to generate a diff where the only change on the line wasn't adding some quotes.

I can't change them all at once, as we don't have the resources to test the entire code base, so am fixing this whenever I make a change to a function. How can I achieve this? Is there anything else similar to this that I can use to achieve a similar result. For example, a simpler method might be to skip the canonical form and just see if the input is transformed into the output. BTW, I am using Git

解决方案

There does not seem to be any options to Git's diff command to support what you want to do. However, you could use the GIT_EXTERNAL_DIFF environment variable and a custom script (or any executable created using your preferred scripting or programming language) to manipulate a patch.

I'll assume you are on Linux; if not, you could tweak this concept to suit your environment. Let's say you have a Git repo where HEAD has a file file05 that contains:

line 26662: $my_array[my_key]

And a file file06 that contains:

line 19768: $my_array[my_key]
line 19769: $my_array[my_key]
line 19770: $my_array[my_key]
line 19771: $my_array[my_key]
line 19772: $my_array[my_key]
line 19773: $my_array[my_key]
line 19775: $my_array[my_key]
line 19776: $my_array[my_key]

You change file05 to:

line 26662: $my_array["my_key"]

And you change file06 to:

line 19768: $my_array[my_key]
line 19769: $my_array["my_key"]
line 19770: $my_array[my_key]
line 19771: $my_array[my_key]
line 19772: $my_array[my_key]
line 19773: $my_array[my_key]
line 19775: $my_array[my_key2]
line 19776: $my_array[my_key]

Using the following shell script, let's call it mydiff.sh and place it somewhere that's in our PATH:

#!/bin/bash
echo "$@"
git diff-files --patch --word-diff=porcelain "${5}" | awk '
/^-./ {rec = FNR; prev = substr($0, 2);}
FNR == rec + 1 && /^+./ {
    ln = substr($0, 2);
    gsub("\\[\"", "[", ln);
    gsub("\"\\]", "]", ln);
    if (prev == ln) {
        print " " ln;
    } else {
        print "-" prev;
        print "+" ln;
    }
}
FNR != rec && FNR != rec + 1 {print;}
'

Executing the command:

GIT_EXTERNAL_DIFF=mydiff.sh git --no-pager diff

Will output:

file05 /tmp/r2aBca_file05 d86525edcf5ec0157366ea6c41bc6e4965b3be1e 100644 file05 0000000000000000000000000000000000000000 100644
index d86525e..c2180dc 100644
--- a/file05
+++ b/file05
@@ -1 +1 @@
 line 26662: 
 $my_array[my_key]
~
file06 /tmp/2lgz7J_file06 d84a44f9a9aac6fb82e6ffb94db0eec5c575787d 100644 file06 0000000000000000000000000000000000000000 100644
index d84a44f..bc27446 100644
--- a/file06
+++ b/file06
@@ -1,8 +1,8 @@
 line 19768: $my_array[my_key]
~
 line 19769: 
 $my_array[my_key]
~
 line 19770: $my_array[my_key]
~
 line 19771: $my_array[my_key]
~
 line 19772: $my_array[my_key]
~
 line 19773: $my_array[my_key]
~
 line 19775: 
-$my_array[my_key]
+$my_array[my_key2]
~
 line 19776: $my_array[my_key]
~

This output does not show changes for the added quotes in file05 and file06. The external diff script basically uses the Git diff-files command to create the patch and filters the output through a GNU awk script to manipulate it. This sample script does not handle all the different combinations of old and new files mentioned for GIT_EXTERNAL_DIFF nor does it output a valid patch, but it should be enough to get you started.

You could use Perl regular expressions, Python difflib or whatever you're comfortable with to implement an external diff tool that suits your needs.

这篇关于用正则表达式过滤差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆