比较两个文件以了解python中的差异 [英] Compare two files for differences in python

查看:172
本文介绍了比较两个文件以了解python中的差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想比较两个文件(从第一个文件中取出一行,然后在整个第二个文件中查找),以查看它们之间的差异,并将缺少的行从fileA.txt写入fileB.txt的末尾。我是python的新手,所以我第一次想到这样的简单程序:

I want to compare two files (take line from first file and look up in whole second file) to see differences between them and write missing line from fileA.txt to end of fileB.txt. I am new to python so at first time I thought abou simple program like this:

import difflib

file1 = "fileA.txt"
file2 = "fileB.txt"

diff = difflib.ndiff(open(file1).readlines(),open(file2).readlines())
print ''.join(diff),

但结果我得到了两个文件的组合,每行带有合适的标签。我知道我可以查找以标签-开头的行,然后将其写入文件fileB.txt的末尾,但是对于大文件(〜100 MB),此方法效率不高。有人可以帮我改进程序吗?

but in result I have got a combination of two files with suitable tags for each line. I know that I can look for line start with tag "-" and then write it to end of file fileB.txt, but with huge file (~100 MB) this method will be inefficient. Can somebody help me to improve program?

文件结构如下:

输入:

fileA.txt

fileA.txt

Oct  9 13:25:31 user sshd[12844]: Accepted password for root from 213.XXX.XXX.XX7 port 33254 ssh2
Oct  9 13:25:31 user sshd[12844]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct  9 13:35:48 user sshd[12868]: Accepted password for root from 213.XXX.XXX.XX7 port 33574 ssh2
Oct  9 13:35:48 user sshd[12868]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct  9 13:46:58 user sshd[12844]: Received disconnect from 213.XXX.XXX.XX7: 11: disconnected by user
Oct  9 13:46:58 user sshd[12844]: pam_unix(sshd:session): session closed for user root
Oct  9 15:47:58 user sshd[12868]: pam_unix(sshd:session): session closed for user root
Oct 11 22:17:31 user sshd[2655]: Accepted password for root from 17X.XXX.XXX.X19 port 5567 ssh2
Oct 11 22:17:31 user sshd[2655]: pam_unix(sshd:session): session opened for user root by (uid=0)

fileB.txt

fileB.txt

    Oct  9 12:19:16 user sshd[12744]: Accepted password for root from 213.XXX.XXX.XX7 port 60554 ssh2
Oct  9 12:19:16 user sshd[12744]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct  9 13:24:42 user sshd[12744]: Received disconnect from 213.XXX.XXX.XX7: 11: disconnected by user
Oct  9 13:24:42 user sshd[12744]: pam_unix(sshd:session): session closed for user root
Oct  9 13:25:31 user sshd[12844]: Accepted password for root from 213.XXX.XXX.XX7 port 33254 ssh2
Oct  9 13:25:31 user sshd[12844]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct  9 13:35:48 user sshd[12868]: Accepted password for root from 213.XXX.XXX.XX7 port 33574 ssh2
Oct  9 13:35:48 user sshd[12868]: pam_unix(sshd:session): session opened for user root by (uid=0)

输出:

fileB_after.txt

fileB_after.txt

Oct  9 12:19:16 user sshd[12744]: Accepted password for root from 213.XXX.XXX.XX7 port 60554 ssh2
Oct  9 12:19:16 user sshd[12744]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct  9 13:24:42 user sshd[12744]: Received disconnect from 213.XXX.XXX.XX7: 11: disconnected by user
Oct  9 13:24:42 user sshd[12744]: pam_unix(sshd:session): session closed for user root
Oct  9 13:25:31 user sshd[12844]: Accepted password for root from 213.XXX.XXX.XX7 port 33254 ssh2
Oct  9 13:25:31 user sshd[12844]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct  9 13:35:48 user sshd[12868]: Accepted password for root from 213.XXX.XXX.XX7 port 33574 ssh2
Oct  9 13:35:48 user sshd[12868]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct  9 13:46:58 user sshd[12844]: Received disconnect from 213.XXX.XXX.XX7: 11: disconnected by user
Oct  9 13:46:58 user sshd[12844]: pam_unix(sshd:session): session closed for user root
Oct  9 15:47:58 user sshd[12868]: pam_unix(sshd:session): session closed for user root
Oct 11 22:17:31 user sshd[2655]: Accepted password for root from 17X.XXX.XXX.X19 port 5567 ssh2
Oct 11 22:17:31 user sshd[2655]: pam_unix(sshd:session): session opened for user root by (uid=0)


推荐答案

bash 中尝试以下操作:

cat fileA.txt fileB.txt | sort -M | uniq > new_file.txt

排序-M
根据初始字符串排序,该字符串由任意数量的空格组成,后跟
,并带有月份名称缩写,折叠成大写字母,并以'JAN'< FEB< ...< ‘DEC’。无效的名称将
低至有效的名称。 LC_TIME语言环境确定月份
的拼写。

sort -M: sorts based on initial string, consisting of any amount of whitespace, followed by a month name abbreviation, is folded to UPPER case and compared in the order 'JAN' < 'FEB' < ... < 'DEC'. Invalid names compare low to valid names. The `LC_TIME' locale determines the month spellings.

uniq:过滤出​​文件中重复的行。

uniq: filters out repeated lines in a file.

|:将一个命令的输出传递给另一个命令以进行进一步处理。

|: passes the output of one command to another for further processing.

这将是什么要做的就是获取两个文件,按照上述方法对它们进行排序,保留唯一项并将其存储在 new_file.txt

What this will do is take the two files, sort them in the way described above, keep the unique items and store them in new_file.txt

注意:这不是python解决方案,但您已使用 linux 标记了该问题,因此我认为您可能会感兴趣。您也可以在此处找到有关所使用命令的更多详细信息。

Note: This is not a python solution but you have tagged the question with linux so I thought it might interest you. Also you can find more detailed info about the commands used, here.

这篇关于比较两个文件以了解python中的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆