Python Difflib增量和比较Ndiff [英] Python Difflib Deltas and Compare Ndiff

查看:158
本文介绍了Python Difflib增量和比较Ndiff的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直想做一些我相信的变更控制系统一样的事情,它们比较两个文件,并在每次文件更改时保存一个小的差异. 我正在阅读此页面: http://docs.python.org/library/difflib.html ,它显然并没有沉入我的脑海.

I was looking to do something like what I believe change control systems do, they compare two files, and save a small diff each time the file changes. I've been reading this page: http://docs.python.org/library/difflib.html and it's not sinking in to my head apparently.

我试图通过下面所示的简单程序来重新创建它, 但是我似乎缺少的是Delta包含的内容至少与原始文件一样多,甚至更多.

I was trying to recreate this in a somewhat simple program shown below, but the thing that I seem to be missing is that the Delta's contain at least as much as the original file, and more.

难道不可能只得到纯粹的改变吗? 我问的原因很明显-节省磁盘空间.
我每次都可以保存整个代码块,但是最好先保存一次当前代码,然后再进行少量更改.

Is it not possible to get to just the pure changes? The reason I ask is hopefully obvious - to save disk space.
I could just save the entire chunk of code each time, but it would be better to save current code once, then small diffs of the changes.

我还在尝试弄清为什么许多difflib函数返回一个生成器而不是一个列表,那有什么好处?

I'm also still trying to figure out why many difflib functions return a generator instead of a list, what's the advantage there?

difflib是否可以为我工作-还是我需要找到一个具有更多功能的更专业的软件包?

Will difflib work for me - or I need to find a more professional package with more features?

# Python Difflib demo 
# Author: Neal Walters 
# loosely based on http://ahlawat.net/wordpress/?p=371
# 01/17/2011 

# build the files here - later we will just read the files probably 
file1Contents="""
for j = 1 to 10: 
   print "ABC"
   print "DEF" 
   print "HIJ"
   print "JKL"
   print "Hello World"
   print "j=" + j 
   print "XYZ"
"""

file2Contents = """
for j = 1 to 10: 
   print "ABC"
   print "DEF" 
   print "HIJ"
   print "JKL"
   print "Hello World"
   print "XYZ"
print "The end"
"""

filename1 = "diff_file1.txt" 
filename2 = "diff_file2.txt" 

file1 = open(filename1,"w") 
file2 = open(filename2,"w") 

file1.write(file1Contents) 
file2.write(file2Contents) 

file1.close()
file2.close() 
#end of file build 

lines1 = open(filename1, "r").readlines()
lines2 = open(filename2, "r").readlines()

import difflib

print "\n FILE 1 \n" 
for line in lines1:
  print line 

print "\n FILE 2 \n" 
for line in lines2: 
  print line 

diffSequence = difflib.ndiff(lines1, lines2) 

print "\n ----- SHOW DIFF ----- \n" 
for i, line in enumerate(diffSequence):
    print line

diffObj = difflib.Differ() 
deltaSequence = diffObj.compare(lines1, lines2) 
deltaList = list(deltaSequence) 

print "\n ----- SHOW DELTALIST ----- \n" 
for i, line in enumerate(deltaList):
    print line



#let's suppose we store just the diffSequence in the database 
#then we want to take the current file (file2) and recreate the original (file1) from it
#by backward applying the diff 

restoredFile1Lines = difflib.restore(diffSequence,1)  # 1 indicates file1 of 2 used to create the diff 

restoreFileList = list(restoredFile1Lines)

print "\n ----- SHOW REBUILD OF FILE1 ----- \n" 
# this is not showing anything! 
for i, line in enumerate(restoreFileList): 
    print line

谢谢!

更新:

contextDiffSeq = difflib.context_diff(lines1, lines2) 
contextDiffList = list(contextDiffSeq) 

print "\n ----- SHOW CONTEXTDIFF ----- \n" 
for i, line in enumerate(contextDiffList):
    print line

--------显示上下文------

----- SHOW CONTEXTDIFF -----

* 5,9 **

 print "HIJ"

 print "JKL"

 print "Hello World"

  • 打印"j ="+ j

    • print "j=" + j

      打印"XYZ"

      --- 5,9 ----

      --- 5,9 ----

       print "HIJ"
      
       print "JKL"
      
       print "Hello World"
      
       print "XYZ"
      

      • 打印结尾"
        • print "The end"
        • 另一个更新:

          在Panvalet图书馆员的早期时代,大型机的源管理工具,您可以创建一个如下的变更集:

          In the old days of Panvalet an Librarian, source management tools for the mainframe, you could create a changeset like this:

          ++ADD 9
             print "j=" + j 
          

          这只是意味着在第9行之后添加一行(或多行). 然后出现单词++ REPLACE或++ UPDATE. http://www4.hawaii.gov/dags/icsd/ppmo/Stds_Web_Pages/pdf/it110401.pdf

          Which simply mean add a line (or lines) after line 9. Then there word words like ++REPLACE or ++UPDATE. http://www4.hawaii.gov/dags/icsd/ppmo/Stds_Web_Pages/pdf/it110401.pdf

          推荐答案

          差异必须包含足够的信息,才能将一个版本修补到另一个版本中,所以可以,对于单行更改为非常小的文档的实验,存储整个文档可能会更便宜.

          Diffs must contain enough information to make it possible to patch a version into another, so yes, for your experiment of a single-line change to a very small document, storing the whole documents could be cheaper.

          库函数返回迭代器,以使其在内存紧张或只需要查看结果序列的一部分的客户端上变得更容易.在Python中是可以的,因为每个迭代器都可以使用非常短的list(an_iterator)表达式转换为列表.

          Library functions return iterators to make it easier on clients that are tight on memory or only need to look at part of the resulting sequence. It's ok in Python because every iterator can be converted to a list with a very short list(an_iterator) expression.

          大多数差异都是在文本行上完成的,但是可以逐个逐个字符地进行,而difflib可以做到.看看difflib中对象的 Differ 类.

          Most differencing is done on lines of text, but it is possible to go down to the char-by-char, and difflib does it. Take a look at the Differ class of object in difflib.

          各地的示例都使用对人类友好的输出,但是差异以更紧凑,计算机友好的方式在内部进行管理.此外,差异通常包含冗余信息(例如要删除的行的文本),以使补丁和合并更改变得安全.如果您愿意,可以通过自己的代码删除冗余.

          The examples all over the place use human-friendly output, but the diffs are managed internally in a much more compact, computer-friendly way. Also, diffs usually contain redundant information (like the text of a line to delete) to make patching and merging changes safe. The redundancy can be removed by your own code, if you feel comfortable with that.

          我刚刚读到,difflib选择最少的惊喜来支持最优性,这是我不会反对的观点.有众所周知的算法,可以快速产生最少的更改.

          I just read that difflib opts for least-surprise in favor of optimality, which is something I won't argue against. There are well known algorithms that are fast at producing a minimum set of changes.

          我曾经用大约1250行Java( JRCS ).它适用于可以比较相等性的任何元素序列.如果您想构建自己的解决方案,我认为JRCS的翻译/重新实现应该使用不超过300行的Python.

          I once coded a generic diffing engine along with one of the optimum algorithms in about 1250 lines of Java (JRCS). It works for any sequence of elements that can be compared for equality. If you want to build your own solution, I think that a translation/reimplementation of JRCS should take no more than 300 lines of Python.

          处理difflib产生的输出以使其更紧凑也是一种选择.这是一个来自具有三个更改(一个添加,一个更改和一个删除)的小文件的示例:

          Processing the output produced by difflib to make it more compact is also an option. This is an example from a small files with three changes (an addition, a change, and a deletion):

          ---  
          +++  
          @@ -7,0 +7,1 @@
          +aaaaa
          @@ -9,1 +10,1 @@
          -c= 0
          +c= 1
          @@ -15,1 +16,0 @@
          -    m = re.match(code_re, text)
          

          补丁所说的内容可以很容易地概括为:

          What the patch says can be easily condensed to:

          +7,1 
          aaaaa
          -9,1 
          +10,1
          c= 1
          -15,1
          

          对于您自己的示例,压缩输出为:

          For your own example the condensed output would be:

          -8,1
          +9,1
          print "The end"
          

          为了安全起见,在必须插入的行旁留一个前导标记('>')可能是个好主意.

          For safety, leaving in a leading marker ('>') for lines that must be inserted might be a good idea.

          -8,1
          +9,1
          >print "The end"
          

          这更接近您的需求了吗?

          Is that closer to what you need?

          这是执行压缩的简单功能.您必须编写自己的代码才能以该格式应用补丁,但这应该很简单.

          This is a simple function to do the compacting. You'll have to write your own code to apply the patch in that format, but it should be straightforward.

          def compact_a_unidiff(s):
              s = [l for l in s if l[0] in ('+','@')]
              result = []
              for l in s:
                  if l.startswith('++'):
                      continue
                  elif l.startswith('+'):
                      result.append('>'+ l[1:])
                  else:
                      del_cmd, add_cmd = l[3:-3].split()
                      del_pair, add_pair = (c.split(',') for c in (del_cmd,add_cmd))
                      if del_pair[1]  != '0':
                          result.append(del_cmd)
                      if add_pair[1] != '0':
                          result.append(add_cmd)
              return result
          

          这篇关于Python Difflib增量和比较Ndiff的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆