尝试比较两个csv文件并将差异写为输出 [英] Trying to compare two csv files and write differences as output

查看：338 发布时间：2020/7/12 0:04:08 python csv compare difference

本文介绍了尝试比较两个csv文件并将差异写为输出的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在开发一个脚本，该脚本将两个csv文件之间的差异并以一个新的csv文件作为输出，但只有两个输入文件之间相同的两行(指行号)包含不同的数据时才有差异，例如在文件1中，第3行具有"mike"，篮球运动员"，在文件2中，第3行具有"mike"，棒球运动员".输出的csv会抓取这些打印出来的内容，并将它们写入到csv中.它可以工作，但是存在一些问题(我知道这个问题之前也曾被问过几次，但是其他人对我的回答却有所不同，并且由于我是编程的新手，所以我不太了解他们的代码).

I'm developing a script which takes the difference between 2 csv files and makes a new csv file as output with the differences BUT only if the same 2 rows (refers to row number) between the two input files contain different data e.g. row 3 has "mike", "basketball player" in file 1 and row 3 in file 2 has "mike", "baseball player". The output csv would grab these print them and write them to a csv. It works but there are some issues (I know that this question has also been asked several times before but others have done it differently to me and since I'm fairly new to programming I don't quite understand their codes).

新的csv文件中的输出在每个单元格中都有输出的每个字母(请参见下图)，我相信这与定界符/quotechar/引号第37行有关.我希望它们在自己的单元格中没有任何分隔符句号，多个空格，逗号或"|".

The output in the new csv file has each letter of the output in each cell (see image below) and I believe its something to do with the delimiter/quotechar/quoting line 37. I want them in their own cells without any fullstops, multiple spaces, commas or "|".

另一个问题是运行需要很长时间.我正在处理多达50,000行的数据集，这可能需要一个多小时才能运行.为什么会这样?有什么建议可以加快速度?将某些内容放在for循环之外?我早些时候确实尝试了difflib方法，但是我只能打印整个"input_file1"，而不能将该文件与另一个文件进行比较.

Another issue is that it takes a long time to run. I'm working with datasets of up to 50,000 rows and it can take over an hour to run. Why is this and what advice would be useful to speed it up? Put something outside of the for loop maybe? I did try the difflib method earlier on but I was only able to print the entire "input_file1" but not compare that file with another.

# aim of script is to compare csv files and output difference as a new csv

# import necessary libraries
import csv

# File1 = open(raw_input("path:"),"r") #filename, mode
# File2 = open(raw_input("path:"),"r") #filename, mode

# selects the 2 input files to be compared
input_file1 = "G:/savestuffhereqwerty/electorate_meshblocks/teststuff/Book1.csv"
input_file2 = "G:/savestuffhereqwerty/electorate_meshblocks/teststuff/Book2.csv"
# creates the blank output csv file
output_path = "G:/savestuffhereqwerty/electorate_meshblocks/outputs/output2.csv"
a = open(input_file1, "r")
output_file = open(output_path,"w")
output_file.close()
count = 0

with open(input_file1) as fp1:


    for row_number1, row_value1 in enumerate(fp1):
        if row_number1 == count:
            print "got to 1st point"
            value1 = row_value1

            with open(input_file2) as fp2:
                for row_number2, row_value2 in enumerate(fp2):
                    if row_number2 == count:
                        print "got to 2nd point"
                        value2 = row_value2

                        if value1 == value2:
                            print value1, value2
                        else:
                            print value1, value2
                            with open(output_path, 'wb') as f:
                                writer = csv.writer(f, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
                                # testing to see if the code writes text to the csv
                                writer.writerow(["test1"])
                                writer.writerow(["test2", "test3", "test4"])
                                writer.writerows([value1, value2])
                                print "code reached writing stage"
        count += 1
        print count
print "done"
# replace(",",".")

尝试比较两个csv文件并将差异写为输出 [英] Trying to compare two csv files and write differences as output

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

尝试比较两个csv文件并将差异写为输出 [英] Trying to compare two csv files and write differences as output

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭