比较2个文本文件中的行 [英] Compare lines in 2 text files

查看:102
本文介绍了比较2个文本文件中的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个CSV格式的大文本文件(超过200,000行).我需要逐行比较它们,但是字段可能会在每行中切换.

I have two large text files (200,000+ lines), CSV format. I need to compare them, line by line, but the fields maybe switched within each line.

示例文件A.csv:

AAA,BBB,,DDD  
EEE,,GGG,HHH  
III,JJJ,KKK,LLL

示例文件B.csv:

AAA,,BBB,DDD  
EEE,,GGG,HHH  
LLL,KKK,JJJ,III

因此,出于我的目的,即使在第一行和最后一行中切换字段,A.csvB.csv也应相同".由于每个文件中的字段的顺序可能不同,因此grep或diff之类的常规选项将无效.

So for my purposes, A.csv and B.csv should be "identical" even though fields are switch in the first and last line. Since the fields in each file might be in a different order, the usual options like grep or diff won't work.

基本上,我认为我需要写一些东西来读取A.csvB.csv行,并检查是否所有字段都出现在这两行中,而与顺序无关.另外,也可以在读取行后对字段进行排序.

Basically, I think I need to write something that reads a line of A.csv and B.csv, and checks if all fields are present in both lines, independent of the order. Alternatively, something that orders the fields after reading the lines.

推荐答案

您可以标准化检查,而不会影响数据.

You can normalize the check, without affecting the data.

with open('big1.csv') as i, open('big2.csv') as j:
   a = csv.reader(i)
   b = csv.reader(j)
   for linea in a:
      lineb = next(b)
      if sorted(map(str.lower, linea)) != sorted(map(str.lower, lineb)):
          print('{} does not match {}'.format(linea, lineb))

这篇关于比较2个文本文件中的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆