比较两个文本文件以查找差异并将其输出到新的文本文件 [英] Compare two text files to find differences and output them to a new text file

查看:918
本文介绍了比较两个文本文件以查找差异并将其输出到新的文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图处理一个简单的数据比较文本文档。目标是用户能够选择文件,在该文件中搜索某个参数,然后在将来自新文本文档的那些参数与具有默认文本文档的文本文档进行比较之后将这些参数打印到新文本文档中参数,然后一旦它们被比较,将差异打印到新的文本文档中。



我创建了一个简单的流程图来总结:





这是我当前的代码。我使用diff库来比较这两个文件。

  import difflib 
from Tkinter import *
import tkSimpleDialog
import tkMessageBox
from tkFileDialog import askopenfilename

root = Tk()
w = Label(root,text =Configuration Inspector)
w .pack()
tkMessageBox.showinfo(欢迎,这是配置检查器的版本1.00)
filename = askopenfilename()#Logs文件
filename2 = askopenfilename
compareFile = askopenfilename()#比较文件
outputfilename = askopenfilename()#从日志中输出序列号配置

with open(filename,rb)as f_input:
start_token = tkSimpleDialog.askstring(Serial Number,什么是序列号?)
end_token = tkSimpleDialog.askstring(结束关键字,什么是结束关键字)
reText = re.search(%s(。*?)%s%(re.escape(start_token +,SHOWALL),re.escape(end_token)),f_input.read(),re.S)
if reText:
output = reText.group(1)
fo = open(outputfilename,wb)
fo.write(output)
fo.close

diff = difflib.ndiff(outputfilename,compareFile)
print'\\\
'.join(list(diff))

else:
tkMessageBox.showinfo(Output,对不起,在文件中找不到输入)
打印未找到

到目前为止的结果是,程序正确地搜索您选择的文件以供搜索,然后将找到的参数打印到新的输出文本文件中。



当尝试比较两个文件,默认数据和输出文件时出现问题。



当比较程序将输出差异然而,由于默认数据文件具有不同于输出文件的行,它将只打印不匹配的行,而不是打印不匹配的参数。换句话说,让我说这两个文件:



默认数据文本文件:

  Data1 = 1 
Data2 = 2
Data3 = 3
Data4 = 4
Data5 = 5
Data6 = 6

输出数据文本文件:

  Data1 = 1 
Data2 = 2
Data3 = 8
Data4 = 7


$ b b

因此,由于Data3和Data4不匹配difference.txt文件(比较输出)应该显示。例如:

  Data3 = 8 
Data4 = 7
Data5 = 5
Data6 = 6

但是它不匹配或比较行,它只是检查看是否有一行那个空间。所以目前我的比较输出如下:

  Data5 = 5 
Data6 = 6

有关如何进行比较的任何想法都会显示文件参数之间的差异?



如果您需要任何更多详细信息,请在评论中告诉我,我将编辑原始帖子以添加更多详细信息。

解决方案

我不知道你想用 difflib.ndiff()做什么。该函数需要两个字符串列表,但是你传递的是文件名。



无论如何,这里有一个简短的演示,执行你想要的比较。它使用 dict 加快比较过程。显然,我没有你的数据文件,所以这个程序使用字符串 .splitlines()方法创建字符串列表。



逐行查看默认数据列表。

如果输出 dict 中不存在该数据,

如果在输出 dict 中存在具有该值的数据键,则跳过该行。

如果找到该键,但输出 dict 中的值不同于默认值,则具有键&输出值。

  #Build默认数据列表
defdata ='''
Data1 = 1
Data2 = 2
Data3 = 3
Data4 = 4
Data5 = 5
Data6 = 6
'''.splitlines()[1:]

#Build输出数据列表
outdata ='''
Data1 = 1
Data2 = 2
Data3 = 8
Data4 = 7
' '.splitlines()[1:]

outdict = dict(line.split('=')for out in outdata)

在defdata中的行:
key,val = line.split('=')
如果键入indict:
outval = outdict [key]
if outval!= val:
print'% s =%s'%(key,outval)
else:
打印行

输出

  Data3 = 8 
Data4 = 7
Data5 = 5
Data6 = 6






将文本文件读入一行列表。

 将open(filename)作为f:
data = f .read()。splitlines()

还有一个 .readlines 方法,但它在这里没有那么有用,因为它在每行的结尾保留 \\\
换行符,我们不想请注意,如果文本文件中有空行,那么结果列表将有一个空字符串' $ c>在那个位置。此外,该代码不会删除每个行上的任何前导或尾随空白或其他空格。但是如果你需要这样做,有成千上万的例子可以告诉你如何在堆栈溢出。






版本2



这个新版本使用略有不同的方法。
它遍历在默认列表或输出列表中找到的所有键的排序列表。

如果只在其中一个列表中找到键,则相应的行将添加到diff列表。

如果在两个列表中都找到一个键,但输出行与默认行不同,则输出列表中的相应行将添加到diff列表中。

  #Build默认数据列表
defdata =' ''
Data1 = 1
Data2 = 2
Data3 = 3
Data4 = 4
Data5 = 5
Data6 = 6
' '.splitlines()[1:]

#Build输出数据列表
outdata ='''
Data1 = 1
Data2 = 2
Data3 = 8
Data4 = 7
Data8 = 8
'''.splitlines()[1:]

def make_dict(data):
return dict((line.split(None,1)[0],line)for data in data)

defdict = make_dict(defdata)
outdict = make_dict(outdata)

#创建一个包含所有键的排序列表
allkeys = sorted(set(defdict)| set(outdict))
#print allkeys

difflines =
for allkeys:
indef = key in defdict
inout = key in outdict
如果indef而不是inout:
difflines.append(defdict [key]
elif inout而不是indef:
difflines.append(outdict [key])
else:
#key必须在两个条目中
defval = defdict [key]
outval = outdict [key]
如果outval!= defval:
difflines.append(outval)

换行中的行:
打印行

输出

  Data3 = 8 
Data4 = 7
Data5 = 5
Data6 = 6
Data8 = 8


I am trying to work on a simple data comparison text document. The goal is for the user to be able to select a file, search through this file for a certain parameter, then print those parameters into a new text document, after compare those parameters from the new text document with a text document that has the default parameters and then once they've been compared to print out the differences into a new text document.

I've created a simple flowchart to summarize this:

This is my current code. I am using the diff lib to compare the two files.

import difflib
from Tkinter import *
import tkSimpleDialog
import tkMessageBox
from tkFileDialog import askopenfilename

root = Tk()
w = Label(root, text ="Configuration Inspector")
w.pack()
tkMessageBox.showinfo("Welcome", "This is version 1.00 of Configuration Inspector")
filename = askopenfilename() # Logs File
filename2 = askopenfilename() # Default Configuration
compareFile = askopenfilename() # Comparison File
outputfilename = askopenfilename() # Out Serial Number Configuration from Logs

with open(filename, "rb") as f_input:
    start_token = tkSimpleDialog.askstring("Serial Number", "What is the serial number?")
    end_token = tkSimpleDialog.askstring("End Keyword", "What is the end keyword")
    reText = re.search("%s(.*?)%s" % (re.escape(start_token + ",SHOWALL"), re.escape(end_token)), f_input.read(), re.S)
    if reText:
        output = reText.group(1)
        fo = open(outputfilename, "wb")
        fo.write(output)
        fo.close()

        diff = difflib.ndiff(outputfilename, compareFile)
        print '\n'.join(list(diff))

    else:
        tkMessageBox.showinfo("Output", "Sorry that input was not found in the file")
        print "not found"

The result so far is that the program correctly searches through the file you select for it to search through, Then prints out the parameters it finds into a new Output Text file.

The issues arises when trying to compare the two files, the Default Data and the Output File.

When comparing the program will output the differences however, Since the Default Data File has different lines than the Output file it will only print out the lines that do not match rather than the Parameters that do not match. In other words lets say I have these two files:

Default Data Text File:

Data1 = 1
Data2 = 2
Data3 = 3
Data4 = 4
Data5 = 5
Data6 = 6

Output Data Text File:

Data1 = 1
Data2 = 2
Data3 = 8
Data4 = 7

So since Data3 and Data4 do Not Match the difference.txt file (The Comparison Output) should show that. For Example:

Data3 = 8
Data4 = 7
Data5 = 5
Data6 = 6

However it does not match or compare the lines, it just checks to see if there's a line in that space. So currently my Comparison output looks like this:

Data5 = 5
Data6 = 6

Any ideas on how I can make the comparison show everything that is difference between the file's parameters?

If you need any more details please let me know in the comments I will edit the original post to add more details.

解决方案

I don't know what you're trying to do with difflib.ndiff(). That function takes two lists of strings, but you are passing it filenames.

Anyway, here's a short demo that performs the comparison that you want. It uses a dict to speed up the comparison process. Obviously, I don't have your data files, so this program creates lists of strings using the string .splitlines() method.

It goes through the default data list line by line.
If that data is not present in the output dict, then the default line is printed.
If a data key with that value is present in the output dict, then that line is skipped.
If the key is found but the value in the output dict is different to the default value, then a line with the key & output value is printed.

#Build default data list
defdata = '''
Data1 = 1
Data2 = 2
Data3 = 3
Data4 = 4
Data5 = 5
Data6 = 6
'''.splitlines()[1:]

#Build output data list
outdata = '''
Data1 = 1
Data2 = 2
Data3 = 8
Data4 = 7
'''.splitlines()[1:]

outdict = dict(line.split(' = ') for line in outdata)

for line in defdata:
    key, val = line.split(' = ')
    if key in outdict:
        outval = outdict[key]
        if outval != val:
            print '%s = %s' % (key, outval)
    else:
        print line

output

Data3 = 8
Data4 = 7
Data5 = 5
Data6 = 6


Here's how to read a text file into a list of lines.

with open(filename) as f:
    data = f.read().splitlines()

There's also a .readlines() method, but it's not so useful here because it preserves the \n newline character at the end of each line, and we don't want that.

Note that if there are any blank lines in the text file then the resulting list will have an empty string '' in that position. Also, that code won't remove any leading or trailing blanks or other whitespace on each line. But if you need to do that there are thousands of examples that can show you how here on Stack Overflow.


Version 2

This new version uses a slightly different approach. It loops over a sorted list of all the keys found in either the default list or the output list.
If a key is only found in one of the lists the corresponding line is added to the diff list.
If a key is found in both lists but the output line differs from the default line then the corresponding line from the output list is added to the diff list. If both lines are identical, nothing is added to the diff list.

#Build default data list
defdata = '''
Data1 = 1
Data2 = 2
Data3 = 3
Data4 = 4
Data5 = 5
Data6 = 6
'''.splitlines()[1:]

#Build output data list
outdata = '''
Data1 = 1
Data2 = 2
Data3 = 8
Data4 = 7
Data8 = 8
'''.splitlines()[1:]

def make_dict(data):
    return dict((line.split(None, 1)[0], line) for line in data)

defdict = make_dict(defdata)
outdict = make_dict(outdata)

#Create a sorted list containing all the keys
allkeys = sorted(set(defdict) | set(outdict))
#print allkeys

difflines = []
for key in allkeys:
    indef = key in defdict
    inout = key in outdict
    if indef and not inout:
        difflines.append(defdict[key])
    elif inout and not indef:
        difflines.append(outdict[key])
    else:
        #key must be in both dicts
        defval = defdict[key]
        outval = outdict[key]
        if outval != defval:
            difflines.append(outval)

for line in difflines:
    print line

output

Data3 = 8
Data4 = 7
Data5 = 5
Data6 = 6
Data8 = 8

这篇关于比较两个文本文件以查找差异并将其输出到新的文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆